Text analysis in R

To begin, we wish to credit Julia Silge for much of the material presented here. Her post on tidytext data analysis and the code therein inspired most of this portion of the workshop.

Selecting a text to analyze for the workshop

Let’s begin by downloading two documents written by our dear Dr. Paul Snelgrove. The first one is his book entitled ‘Discoveries of the Census of Marine Life: Making Ocean Life Count’ and the second is one of his most cited paper (n = 886) according to google scholar, Getting to the Bottom of Marine Biodiversity: Sedimentary Habitats: Ocean bottoms are the most widespread habitat on Earth and support high biodiversity and key ecosystem services.

    # Download file from the web
        download.file('http://www.cambridge.org/download_file/153663','Snelgrove_Text_Only.pdf', mode = 'wb')

        download.file('https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/bioscience/49/2/10.2307/1313538/2/49-2-129.pdf?Expires=1493740003&Signature=Jp0aTSea-mCne3vgy6tE2JYz58l-K4iWPF3gnd8r-GagpywgUq8U9WcImKsSa~bDZOj5mY3216xNoqSDWXGLBdumI-WIRvUrHFKj1AetoS-Rsyup0NVO9nWE0te8dsIYDWEKkUXvu7-9xdHnmpa5QSNUlE8kM8V~4B68mdJR7W0eE-at~GH4p7IrPRDeAN9n8U~I2Kd-s~KkYgAV6ASxvzXhK4sLUaja~xs3n5eXdhTrcaJINFOxk2~2nU17jDgXM6PUw3W5Epvdywz4s6~eU4IyWCst4sokZAV3OczO7NiagYdKq3foTi~Y-EP~c0uvPr1gRFoxT6itFNGSSZCk6A__&Key-Pair-Id=APKAIUCZBIA4LVPAVW3Q/', '49-2-129.pdf', mode = 'wd')

Importing text from pdfs in R

The package pdftools allows you to read the content of a pdf rather easily and is available on ROpenSci. See package details for more functions

    # Read pdf file
        book <- pdf_text("Snelgrove_Text_Only.pdf")
        paper <- pdf_text("49-2-129.pdf")

        book[1]

[1] "     Discoveries of the Census of Marine Life:\n                 Making Ocean Life Count\nOver the 10-year course of the recently completed Census of Marine Life,\na global network of researchers in more than 80 nations has collaborated\nto improve our understanding of marine biodiversity – past, present, and\nfuture.\n        Providing insight into this remarkable project, this book explains\nthe rationale behind the Census and highlights some of its most important\nand dramatic findings, illustrated with full-color photographs throughout.\nIt explores how new technologies and partnerships have contributed to\ngreater knowledge of marine life, from unknown species and habitats, to\nmigration routes and distribution patterns, and to a better appreciation of\nhow the oceans are changing. Looking to the future, it identifies-what\nneeds to be done to close the remaining gaps in our knowledge, and\nprovides information that will enable us to manage resources more\neffectively, conserve diversity, reverse habitat losses, and respond to\nglobal climate change.\n        PAUL SNELGROVE is a Professor in Memorial University of\nNewfoundland’s Ocean Sciences Centre and Biology Department. He\nchaired the Synthesis Group of the Census of Marine Life that has\noverseen the final phase of the program. He is now Director of the\nNSERC Canadian Healthy Oceans Network, a research collaboration of\n"

     Discoveries of the Census of Marine Life:
                 Making Ocean Life Count
Over the 10-year course of the recently completed Census of Marine Life,
a global network of researchers in more than 80 nations has collaborated
to improve our understanding of marine biodiversity – past, present, and
future.
        Providing insight into this remarkable project, this book explains
the rationale behind the Census and highlights some of its most important
and dramatic findings, illustrated with full-color photographs throughout.
It explores how new technologies and partnerships have contributed to
greater knowledge of marine life, from unknown species and habitats, to
migration routes and distribution patterns, and to a better appreciation of
how the oceans are changing. Looking to the future, it identifies-what
needs to be done to close the remaining gaps in our knowledge, and
provides information that will enable us to manage resources more
effectively, conserve diversity, reverse habitat losses, and respond to
global climate change.
        PAUL SNELGROVE is a Professor in Memorial University of
Newfoundland’s Ocean Sciences Centre and Biology Department. He
chaired the Synthesis Group of the Census of Marine Life that has
overseen the final phase of the program. He is now Director of the
NSERC Canadian Healthy Oceans Network, a research collaboration of

        length(book)

[1] 398

        paper[1]

[1] "      Getting to the Bottom of Marine\n Biodiversity:                                   Sedimentary                                 Habitats\n       Oceanbottomsare the most widespreadhabitaton Earthand\n                support high biodiversityand key ecosystemservices\n                                                   Paul V. R. Snelgrove\nT        heoceansencompasshabitats                                I\n                                                                                    Living in marine sediments\n         ranging from highly produc-\n         tive coastal regions to lightless,        Estimates of total               Organisms that live in marine sedi-\n                                                                                    ments face numerous challenges.\nhigh-pressure, and low-temperature\ndeep-sea environments. The benthic                  species numbers                 Except in the shallowest areas, where\n                                                                                    there is sufficient light to allow pho-\n(bottom-living) species that reside                 suggest that less\nwithin the sediments in these habi-                                                 tosynthesis at the bottom, most sedi-\n                                                                                    mentary organisms are dependent on\ntats form one of the richest species              than 1 % of marine                phytoplankton and other organic\npools in the oceans and perhaps on\nEarth. Even though 70.8% of the                   benthic species are               material sinking down from surface\nearth is covered by oceans, and most                                                waters above. The spatial decoupling\nocean floor is covered by sediments,               presently known                  of production from most marine\n                                                                                    benthic environments makes these\nthere is still much to learn about\n                                                                                    environments fundamentally differ-\nbiodiversity in marine sediments. The\n                                            terns are thought to exist, and why     ent from those of terrestrial (Wall\nmajor reasons for the gaps in knowl-\n                                            we should care. Further discussions     and Moore 1999) and freshwater\nedge are logistics and effort. Ap-\n                                            of marine biodiversity (NRC 1995),      (Covich et al. 1999) benthos. With\nproximately 65.5% of the planet is\ncovered by ocean that is greater than       and biodiversity in marine sediments    increasing water depth, the amount\n130 m in depth (i.e., the approxi-          in particular (Snelgrove et al. 1997),  of material reaching the bottom de-\nmate depth limit of the continental         may be found elsewhere.                 creases; most deep-sea sedimentary\n                                                The oceans harbor tremendous        environments are thought to be food\nshelf) and is accessible only by sub-\nmersibles or remote-sampling gear.          biological diversity. Of the 29         limited.\nEven the remaining shallow areas            nonsymbiont animal phyla that have          To take advantage of whatever\n                                            been described so far, all but one has  food is present, some organisms (sus-\n(i.e., approximately 5% of the earth's\nsurface) present challenges in terms        living representatives in the ocean,    pension feeders) are able to remove\nof ship availability and cost, as well      and 13 are represented only in the      suspended particles from near-bot-\nas loss of experiments and ship time        oceans; all of these phyla have repre-  tom water; others (deposit feeders)\nto weather.                                 sentatives in the benthos, and most     rely on particles that have settled\n                                            have representatives in marine sedi-    onto the bottom. Some mega- and\n    Despite these logistical difficul-      ments. Most of the species diversity    macrofaunal species suspension feed,\nties, it is important to improve our        in marine ecosystems consists of in-\nunderstanding of biodiversity in                                                    many deposit feed, and a few\nmarine sediments. In this article, I        vertebrates residing in (infauna) and   macrofaunal species do both. Meio-\ndescribe the biodiversity of organ-         on (epifauna) sediments. These in-      fauna and microbiota depend on de-\nisms residing in the marine sedimen-        vertebrates include large animals       posited organic material. The mobil-\ntary environment, the patterns that         (megafauna), such as scallops and       ity of many benthic organisms is\nhave been observed, why these pat-          crabs, that can be identified from      relatively limited; many are sessile,\n                                            bottom photographs. However, most       and others have only limited mobil-\nPaulV. R. Snelgrove(psnelgro@gill.ifmt.     species are polychaetes, crustaceans,   ity within sediments. As a result, many\n                                            mollusks (macrofauna, larger than       benthic species rely completely on the\nnf.ca) is an associate chair of Fisheries\nConservationin the Fisheriesand Marine      300 gim), and tiny crustaceans and      water above them to supply food.\nInstitute, Memorial University of New-      nematodes (meiofauna, 44-300 gim).         Water also supplies oxygen, a ba-\nfoundland, Box 4920, St. John's, New-       In addition, there are the poorly known sic requirement for most organisms\nfoundland, Canada AiC 5R3. ? 1999           microbiota (smaller than 44 ,um),       residing in sediments. As organisms\nAmericanInstituteof BiologicalSciences.     which include bacteria and protists.    respire and use up oxygen, sediments\nFebruary 1999                                                                                                           129\n"

      Getting to the Bottom of Marine
 Biodiversity:                                   Sedimentary                                 Habitats
       Oceanbottomsare the most widespreadhabitaton Earthand
                support high biodiversityand key ecosystemservices
                                                   Paul V. R. Snelgrove
T        heoceansencompasshabitats                                I
                                                                                    Living in marine sediments
         ranging from highly produc-
         tive coastal regions to lightless,        Estimates of total               Organisms that live in marine sedi-
                                                                                    ments face numerous challenges.
high-pressure, and low-temperature
deep-sea environments. The benthic                  species numbers                 Except in the shallowest areas, where
                                                                                    there is sufficient light to allow pho-
(bottom-living) species that reside                 suggest that less
within the sediments in these habi-                                                 tosynthesis at the bottom, most sedi-
                                                                                    mentary organisms are dependent on
tats form one of the richest species              than 1 % of marine                phytoplankton and other organic
pools in the oceans and perhaps on
Earth. Even though 70.8% of the                   benthic species are               material sinking down from surface
earth is covered by oceans, and most                                                waters above. The spatial decoupling
ocean floor is covered by sediments,               presently known                  of production from most marine
                                                                                    benthic environments makes these
there is still much to learn about
                                                                                    environments fundamentally differ-
biodiversity in marine sediments. The
                                            terns are thought to exist, and why     ent from those of terrestrial (Wall
major reasons for the gaps in knowl-
                                            we should care. Further discussions     and Moore 1999) and freshwater
edge are logistics and effort. Ap-
                                            of marine biodiversity (NRC 1995),      (Covich et al. 1999) benthos. With
proximately 65.5% of the planet is
covered by ocean that is greater than       and biodiversity in marine sediments    increasing water depth, the amount
130 m in depth (i.e., the approxi-          in particular (Snelgrove et al. 1997),  of material reaching the bottom de-
mate depth limit of the continental         may be found elsewhere.                 creases; most deep-sea sedimentary
                                                The oceans harbor tremendous        environments are thought to be food
shelf) and is accessible only by sub-
mersibles or remote-sampling gear.          biological diversity. Of the 29         limited.
Even the remaining shallow areas            nonsymbiont animal phyla that have          To take advantage of whatever
                                            been described so far, all but one has  food is present, some organisms (sus-
(i.e., approximately 5% of the earth's
surface) present challenges in terms        living representatives in the ocean,    pension feeders) are able to remove
of ship availability and cost, as well      and 13 are represented only in the      suspended particles from near-bot-
as loss of experiments and ship time        oceans; all of these phyla have repre-  tom water; others (deposit feeders)
to weather.                                 sentatives in the benthos, and most     rely on particles that have settled
                                            have representatives in marine sedi-    onto the bottom. Some mega- and
    Despite these logistical difficul-      ments. Most of the species diversity    macrofaunal species suspension feed,
ties, it is important to improve our        in marine ecosystems consists of in-
understanding of biodiversity in                                                    many deposit feed, and a few
marine sediments. In this article, I        vertebrates residing in (infauna) and   macrofaunal species do both. Meio-
describe the biodiversity of organ-         on (epifauna) sediments. These in-      fauna and microbiota depend on de-
isms residing in the marine sedimen-        vertebrates include large animals       posited organic material. The mobil-
tary environment, the patterns that         (megafauna), such as scallops and       ity of many benthic organisms is
have been observed, why these pat-          crabs, that can be identified from      relatively limited; many are sessile,
                                            bottom photographs. However, most       and others have only limited mobil-
PaulV. R. Snelgrove(psnelgro@gill.ifmt.     species are polychaetes, crustaceans,   ity within sediments. As a result, many
                                            mollusks (macrofauna, larger than       benthic species rely completely on the
nf.ca) is an associate chair of Fisheries
Conservationin the Fisheriesand Marine      300 gim), and tiny crustaceans and      water above them to supply food.
Institute, Memorial University of New-      nematodes (meiofauna, 44-300 gim).         Water also supplies oxygen, a ba-
foundland, Box 4920, St. John's, New-       In addition, there are the poorly known sic requirement for most organisms
foundland, Canada AiC 5R3. ? 1999           microbiota (smaller than 44 ,um),       residing in sediments. As organisms
AmericanInstituteof BiologicalSciences.     which include bacteria and protists.    respire and use up oxygen, sediments
February 1999                                                                                                           129

        length(paper)

[1] 10

As you can see, the resulting book is coerced as a page per string for a total of 398 pages, while there are 10 pages for the paper, with lines divided by ‘\n’. If you look at the actual pdf, you will also realize that tables are not imported in R using the pdf_text function. This could be highly useful to get data embedded in pdf format. If you wish do so, you can take a look at the package tabulizer, which we will not cover in this workshop.

Now let’s tidy up the text to make it more easily usable for further analyses.

Tidy up text

    # Divide strings per line using '\n as a separator'
        book <- str_split(book, '\n')
        paper <- str_split(paper, '\n')
        book[[1]][1:10]

 [1] "     Discoveries of the Census of Marine Life:"                            
 [2] "                 Making Ocean Life Count"                                  
 [3] "Over the 10-year course of the recently completed Census of Marine Life,"  
 [4] "a global network of researchers in more than 80 nations has collaborated"  
 [5] "to improve our understanding of marine biodiversity – past, present, and"  
 [6] "future."                                                                   
 [7] "        Providing insight into this remarkable project, this book explains"
 [8] "the rationale behind the Census and highlights some of its most important" 
 [9] "and dramatic findings, illustrated with full-color photographs throughout."
[10] "It explores how new technologies and partnerships have contributed to"

     Discoveries of the Census of Marine Life:

                 Making Ocean Life Count

Over the 10-year course of the recently completed Census of Marine Life,

a global network of researchers in more than 80 nations has collaborated

to improve our understanding of marine biodiversity – past, present, and

future.

        Providing insight into this remarkable project, this book explains

the rationale behind the Census and highlights some of its most important

and dramatic findings, illustrated with full-color photographs throughout.

It explores how new technologies and partnerships have contributed to

    # Trim whitespaces at the beginning and end of lines
        book <- lapply(X = book, FUN = str_trim, side = 'both')
        paper <- lapply(X = paper, FUN = str_trim, side = 'both')
        book[[1]][1:10]

 [1] "Discoveries of the Census of Marine Life:"                                 
 [2] "Making Ocean Life Count"                                                   
 [3] "Over the 10-year course of the recently completed Census of Marine Life,"  
 [4] "a global network of researchers in more than 80 nations has collaborated"  
 [5] "to improve our understanding of marine biodiversity – past, present, and"  
 [6] "future."                                                                   
 [7] "Providing insight into this remarkable project, this book explains"        
 [8] "the rationale behind the Census and highlights some of its most important" 
 [9] "and dramatic findings, illustrated with full-color photographs throughout."
[10] "It explores how new technologies and partnerships have contributed to"

Discoveries of the Census of Marine Life:

Making Ocean Life Count

Over the 10-year course of the recently completed Census of Marine Life,

a global network of researchers in more than 80 nations has collaborated

to improve our understanding of marine biodiversity – past, present, and

future.

Providing insight into this remarkable project, this book explains

the rationale behind the Census and highlights some of its most important

and dramatic findings, illustrated with full-color photographs throughout.

It explores how new technologies and partnerships have contributed to

    # Transform as a matrix
        bookMat <- matrix(nrow = 0, ncol = 3, dimnames = list(c(), c('text','page','document')))
        for(i in 1:length(book)) {
            bk <- cbind(book[[i]], rep(i, length(book[[i]])), 'Discoveries of the Census of Marine Life')
            bookMat <- rbind(bookMat, bk)
        }

        paperMat <- matrix(nrow = 0, ncol = 3, dimnames = list(c(), c('text','page','document')))
        for(i in 1:length(paper)) {
            bk <- cbind(paper[[i]], rep(i, length(paper[[i]])), 'Getting to the Bottom of Marine Biodiversity')
            paperMat <- rbind(paperMat, bk)
        }

        kable(bookMat[1:10, ])

text	page	document
Discoveries of the Census of Marine Life:	1	Discoveries of the Census of Marine Life
Making Ocean Life Count	1	Discoveries of the Census of Marine Life
Over the 10-year course of the recently completed Census of Marine Life,	1	Discoveries of the Census of Marine Life
a global network of researchers in more than 80 nations has collaborated	1	Discoveries of the Census of Marine Life
to improve our understanding of marine biodiversity – past, present, and	1	Discoveries of the Census of Marine Life
future.	1	Discoveries of the Census of Marine Life
Providing insight into this remarkable project, this book explains	1	Discoveries of the Census of Marine Life
the rationale behind the Census and highlights some of its most important	1	Discoveries of the Census of Marine Life
and dramatic findings, illustrated with full-color photographs throughout.	1	Discoveries of the Census of Marine Life
It explores how new technologies and partnerships have contributed to	1	Discoveries of the Census of Marine Life

    # Remove empty strings
        bookMat[bookMat[,'text'] == '', 'text'] <- NA
        bookMat <- na.omit(bookMat)

        paperMat[paperMat[,'text'] == '', 'text'] <- NA
        paperMat <- na.omit(paperMat)

    # Convert to data.frame
        bookMat <- as.data.frame(bookMat)
        bookMat[, 'text'] <- as.character(bookMat[, 'text'])
        bookMat[, 'page'] <- as.numeric(paste(bookMat[, 'page']))
        bookMat[, 'document'] <- as.character(paste(bookMat[, 'document']))
        kable(bookMat[1:10, ])

text	page	document
Discoveries of the Census of Marine Life:	1	Discoveries of the Census of Marine Life
Making Ocean Life Count	1	Discoveries of the Census of Marine Life
Over the 10-year course of the recently completed Census of Marine Life,	1	Discoveries of the Census of Marine Life
a global network of researchers in more than 80 nations has collaborated	1	Discoveries of the Census of Marine Life
to improve our understanding of marine biodiversity – past, present, and	1	Discoveries of the Census of Marine Life
future.	1	Discoveries of the Census of Marine Life
Providing insight into this remarkable project, this book explains	1	Discoveries of the Census of Marine Life
the rationale behind the Census and highlights some of its most important	1	Discoveries of the Census of Marine Life
and dramatic findings, illustrated with full-color photographs throughout.	1	Discoveries of the Census of Marine Life
It explores how new technologies and partnerships have contributed to	1	Discoveries of the Census of Marine Life

        paperMat <- as.data.frame(paperMat)
        paperMat[, 'text'] <- as.character(paperMat[, 'text'])
        paperMat[, 'page'] <- as.numeric(paste(paperMat[, 'page']))
        paperMat[, 'document'] <- as.character(paste(paperMat[, 'document']))
        kable(paperMat[1:10, ])

text	page	document
Getting to the Bottom of Marine	1	Getting to the Bottom of Marine Biodiversity
Biodiversity: Sedimentary Habitats	1	Getting to the Bottom of Marine Biodiversity
Oceanbottomsare the most widespreadhabitaton Earthand	1	Getting to the Bottom of Marine Biodiversity
support high biodiversityand key ecosystemservices	1	Getting to the Bottom of Marine Biodiversity
Paul V. R. Snelgrove	1	Getting to the Bottom of Marine Biodiversity
T heoceansencompasshabitats I	1	Getting to the Bottom of Marine Biodiversity
Living in marine sediments	1	Getting to the Bottom of Marine Biodiversity
ranging from highly produc-	1	Getting to the Bottom of Marine Biodiversity
tive coastal regions to lightless, Estimates of total Organisms that live in marine sedi-	1	Getting to the Bottom of Marine Biodiversity
ments face numerous challenges.	1	Getting to the Bottom of Marine Biodiversity

    # bind as a single data.framt
        paul <- rbind(bookMat, paperMat)

The resulting data for the paper are not perfect (i.e. truncated words are not regrouped), but they will serve for our purposes!

Text analysis

Now that we have our document as a R data frame, we can start playing around with it. There are multiple packages that allow you perform text analyses. We will focus on package tidytext as it aligns with the tidyverse, but the package tm is another important package used to perform text analysis under R. Each package uses certain classes of object, but classes can be modified quite easily for use between the packages. See this vignette for a description of steps and functions used to achieve this.

We will first transform the book as a tibble class object for use in tidytext

    # Convert to tibble class object for use in tidytext
        paul <- as_tibble(paul)
        paul

Group your text by attributes

The book can easily be grouped by certain criterion. For exemple, we will start by grouping the data according to which part of the book they belong. There are a total of three parts to this book. Unfortunately, this particular process does not work for our pdf document, but the code does work we might use it again elsewhere, let’s say for an exercice (cough cough).

    paulDoc <- paul %>%
                    mutate(linenumber = row_number(),
                           chapter  = cumsum(str_detect(text, regex("^part [\\divxlc]", ignore_case = TRUE)))) %>%
                    ungroup()

    paulDoc

    unique(paulDoc[, 'chapter'])

This function is not useful with these documents as they are not divided by chapters and the function does not differentiate between documents, yet the regex code can still be useful to divide your text however you may see fit.

Term frequency

A frequent analysis performed with text is to evaluate the frequency of words used in the text. This can be done by first unnesting the individual words from the text with the unnest_tokens function and using the count function to count the number of words.

    # Count per word
        paulWord <- paul %>%
                        unnest_tokens(word, text) %>% # divide by word
                        count(word, sort = TRUE) # counts the word frequency
        paulWord

    # Total number of words
        totalWords <- paulWord %>% summarize(total = sum(n))
        totalWords

You could also apply this directly to the data grouped by chapters to obtain a word count per chapters.

    # Count per word per chapter
        paulWordDocs <- paul %>%
                            mutate(linenumber = row_number()) %>%
                            ungroup() %>%
                            unnest_tokens(word, text) %>%
                            count(document, word, sort = TRUE) %>%
                            ungroup()

    # Attach total number of words per chapter                            
        totalWords <- paulWordDocs %>% group_by(document) %>% summarize(total = sum(n))
        totalWords

        paulWordDocs <- left_join(paulWordDocs, totalWords)

Joining, by = "document"

        paulWordDocs

        ggplot(paulWordDocs, aes(n/total, fill = document)) +
          geom_histogram(show.legend = FALSE) +
          xlim(NA, 0.0009) +
          facet_wrap(~document, ncol = 2, scales = "free_y")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning: Removed 290 rows containing non-finite values (stat_bin).

Words like ‘the’, ‘and’, ‘of’ are the most common by far. Those are referred to as stop words and are usually removed in text analysis. A list of such words is available in the stop_words dataset and can be removed from your dataset using function anti_join.

    # Count per word after stop words removal
        data("stop_words")
        paulWordDocs <- paul %>%
                            mutate(linenumber = row_number()) %>%
                            ungroup() %>%
                            unnest_tokens(word, text) %>%
                            anti_join(stop_words) %>% # remove stop words
                            count(document, word, sort = TRUE) %>%
                            ungroup()

Joining, by = "word"

        totalWords <- paulWordDocs %>% group_by(document) %>% summarize(total = sum(n))
        paulWordDocs <- left_join(paulWordDocs, totalWords)

Joining, by = "document"

        paulWordDocs

        ggplot(paulWordDocs, aes(n/total, fill = document)) +
          geom_histogram(show.legend = FALSE) +
          xlim(NA, 0.0009) +
          facet_wrap(~document, ncol = 2, scales = "free_y")

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Warning: Removed 326 rows containing non-finite values (stat_bin).

Warning: Removed 2 rows containing missing values (geom_bar).

Word frequency comparison

The frequency of words could also be compared between documents.

    paulBook <- as_tibble(bookMat)
    paulPaper <- as_tibble(paperMat)

    paulBookWord <- paulBook %>%
                        mutate(linenumber = row_number()) %>%
                        ungroup() %>%
                        unnest_tokens(word, text) %>%
                        anti_join(stop_words) %>% # remove stop words
                        count(word, sort = TRUE) %>%

                        ungroup()

Joining, by = "word"

    paulPaperWord <- paulPaper %>%
                        mutate(linenumber = row_number()) %>%
                        ungroup() %>%
                        unnest_tokens(word, text) %>%
                        anti_join(stop_words) %>% # remove stop words
                        count(word, sort = TRUE) %>%
                        ungroup()

Joining, by = "word"

    frequency <- paulBookWord %>%
                    rename(Book = n) %>%
                    inner_join(paulPaperWord) %>%
                    rename(Paper = n) %>%
                    mutate(Book = Book / sum(Book),
                           Paper = Paper / sum(Paper)) %>%
                    ungroup()

Joining, by = "word"

    ggplot(frequency, aes(x = Book, y = Paper, color = abs(Paper - Book))) +
            geom_abline(color = "gray40") +
            geom_jitter(alpha = 0.1, size = 2.5, width = 0.4, height = 0.4) +
            geom_text(aes(label = word), check_overlap = TRUE, vjust = 1.5) +
            scale_x_log10(labels = percent_format()) +
            scale_y_log10(labels = percent_format()) +
            scale_color_gradient(limits = c(0, 0.001), low = "darkslategray4", high = "gray75") +
            theme_minimal(base_size = 14) +
            theme(legend.position="none") +
            labs(title = "Comparing Word Frequencies",
                 subtitle = "Word frequencies in Paul Snelgroves's book and paper",
                 y = "Getting to the Bottom of Marine Biodiversity", x = "Discoveries of the Census of Marine Life")

Sentiment analysis

tidytext also gives the opportunity to perform a cursory sentiment analysis, i.e. evaluating whether the text is more or less negative, using the sentiment dataset. While it may not be as useful to qualify positive or negative science, it may reveal some insights as to the overall style of writing of the author (we are looking at you Dr. Snelgrove).

    # Gather list of sentiments from tidytext
        bing <- sentiments %>%
                filter(lexicon == "bing") %>%
                dplyr::select(-score)
        bing

        paulSentiment <- paul %>%
                            mutate(linenumber = row_number()) %>%
                            ungroup() %>%
                            unnest_tokens(word, text) %>%
                            anti_join(stop_words) %>% # remove stop words
                            inner_join(bing) %>% # join with sentiment dataset
                            count(document, index = linenumber %/% 80, sentiment) %>%
                            spread(sentiment, n, fill = 0) %>%
                            mutate(sentiment = positive - negative)

Joining, by = "word"
Joining, by = "word"

        paulSentiment

        # plot
            ggplot(paulSentiment, aes(index, sentiment, fill = document)) +
                    geom_bar(stat = "identity", show.legend = FALSE) +
                    facet_wrap(~document, ncol = 2, scales = "free_x") +
                    theme_minimal(base_size = 13) +
                    labs(title = "Sentiment in Paul Snelgrove's writing",
                         y = "Sentiment") +
                    scale_fill_viridis(end = 0.75, discrete=TRUE, direction = -1) +
                    scale_x_discrete(expand=c(0.02,0)) +
                    theme(strip.text=element_text(hjust=0)) +
                    theme(strip.text = element_text(face = "italic")) +
                    theme(axis.title.x=element_blank()) +
                    theme(axis.ticks.x=element_blank()) +
                    theme(axis.text.x=element_blank())

        # Most common positive and negative words
        paulSentimentCount <- paul %>%
                                mutate(linenumber = row_number()) %>%
                                ungroup() %>%
                                unnest_tokens(word, text) %>%
                                anti_join(stop_words) %>% # remove stop words
                                inner_join(bing) %>% # join with sentiment dataset
                                count(document, word, sentiment, sort = TRUE) %>%
                                ungroup()

Joining, by = "word"
Joining, by = "word"

        # Contribution to sentiment
            paulSentimentCount %>%
                  filter(n > 10) %>%
                  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
                  mutate(word = reorder(word, n)) %>%
                  ggplot(aes(word, n, fill = sentiment)) +
                  geom_bar(stat = "identity") +
                  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
                  ylab("Contribution to sentiment")

“Units Beyond Words”

Words are not the only units of text that can be extracted using the unnest_tokens function. Look at the package vignette for more information!

paulSentences <- paul %>%
                    group_by(document) %>%
                    unnest_tokens(sentence, text, token = "sentences") %>%
                    ungroup()


paulSentences$sentence[1]

[1] "discoveries of the census of marine life: making ocean life count over the 10-year course of the recently completed census of marine life, a global network of researchers in more than 80 nations has collaborated to improve our understanding of marine biodiversity – past, present, and future."

discoveries of the census of marine life: making ocean life count over the 10-year course of the recently completed census of marine life, a global network of researchers in more than 80 nations has collaborated to improve our understanding of marine biodiversity – past, present, and future.

Networks of Words

You can also asociate which words are used more often together with the function pairwise_count from the package widyr

        paulWord <- paul %>%
                        mutate(linenumber = row_number()) %>%
                        ungroup() %>%
                        unnest_tokens(word, text) %>%
                        anti_join(stop_words)

        paulWordOcc <- pairwise_count(paulWord, word, linenumber, sort = TRUE)

        set.seed(1813)
        paulWordOcc %>%
                filter(n >= 25) %>%
                graph_from_data_frame() %>%
                ggraph(layout = "fr") +
                geom_edge_link(aes(edge_alpha = n, edge_width = n)) +
                geom_node_point(color = "darkslategray4", size = 5) +
                geom_node_text(aes(label = name), vjust = 1.8) +
                ggtitle(expression(paste("Word Network in Paul Snelgrove's book ",
                                         italic("Census of Marine Life")))) +
                theme_void()

        netD3 <- paulWordOcc %>%
                filter(n >= 25) %>%
                graph_from_data_frame()

        netD3 <- networkD3::igraph_to_networkD3(netD3, group = rep(1, vcount(netD3)), what = 'both')

        networkD3::forceNetwork(Links = netD3$links,
                                Nodes = netD3$nodes,
                                Source = 'source',
                                Target = 'target',
                                NodeID = 'name',
                                Group = 'group',
                                zoom = TRUE,
                                linkDistance = 50,
                                fontSize = 12,
                                opacity = 0.9,
                                charge = -10)

Wordle

Another neat visual tool available in R is the ability to produce custom wordle based on the results of your analyses using the package wordcloud2

        paulWord <- paul %>%
                    mutate(linenumber = row_number()) %>%
                    ungroup() %>%
                    unnest_tokens(word, text) %>%
                    anti_join(stop_words) %>% # remove stop words
                    count(document, word, sort = TRUE) %>% # counts the word frequency
                    ungroup() %>%
                    filter(n >= 20)

Joining, by = "word"

        paulWordDF <- as.data.frame(paulWord[, c('word','n')])

        # Basic wordle
            wordcloud2::wordcloud2(paulWordDF, size = 1, color="random-light", backgroundColor=1)

        # # Word shaped wordle
        #     wordcloud2::letterCloud(paulWordDF, word = "Paul")
        #
        # # Image shaped wordle
        #     wordcloud2::wordcloud2(paulWordDF, figPath = "./CoML_icon.png", size = 1.5)
        #
        #     wordcloud2::wordcloud2(paulWordDF, figPath = "./CHONe.jpg", size = 1.5)

<<BACK

LS0tCnRpdGxlOiAiVGV4dCBhbmFseXNpcyIKb3V0cHV0OgogIGh0bWxfZG9jdW1lbnQ6CiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKICBodG1sX25vdGVib29rOgogICAgdG9jOiB5ZXMKICAgIHRvY19mbG9hdDogeWVzCi0tLQpbPDxCQUNLXShodHRwczovL3JlbWktZGFpZ2xlLmdpdGh1Yi5pby8yMDE3LUNIT05lLURhdGEvKQoKPCEtLQoKcm0obGlzdD1scygpKQpzZXR3ZCgnL1VzZXJzL2RhdmlkYmVhdWNoZXNuZS9kcm9wYm94L3BoZC9taXNjLzIwMTctY2hvbmUtZGF0YS8yMDE3LWNob25lLWRhdGEvJykKbGlicmFyeShybWFya2Rvd24pCnJlbmRlcihpbnB1dCA9ICd0ZXh0LnJtZCcsICdodG1sX2RvY3VtZW50JykKCi0tPgoKCiMgVGV4dCBhbmFseXNpcyBpbiBSCgpUbyBiZWdpbiwgd2Ugd2lzaCB0byBjcmVkaXQgSnVsaWEgU2lsZ2UgZm9yIG11Y2ggb2YgdGhlIG1hdGVyaWFsIHByZXNlbnRlZCBoZXJlLiBIZXIgW3Bvc3RdKGh0dHA6Ly9qdWxpYXNpbGdlLmNvbS9ibG9nL0xpZmUtQ2hhbmdpbmctTWFnaWMvKSBvbiBgdGlkeXRleHRgIGRhdGEgYW5hbHlzaXMgYW5kIHRoZSBjb2RlIHRoZXJlaW4gaW5zcGlyZWQgbW9zdCBvZiB0aGlzIHBvcnRpb24gb2YgdGhlIHdvcmtzaG9wLgoKYGBge3IsIGVjaG89RkFMU0V9CmxpYnJhcnkoZHBseXIpCmxpYnJhcnkodGlkeXRleHQpCmxpYnJhcnkocGRmdG9vbHMpCmxpYnJhcnkoc3RyaW5ncikKbGlicmFyeSh0aWR5cikKbGlicmFyeShnZ3Bsb3QyKQpsaWJyYXJ5KHZpcmlkaXMpCmxpYnJhcnkodGliYmxlKQpsaWJyYXJ5KGtuaXRyKQpsaWJyYXJ5KHdpZHlyKQpsaWJyYXJ5KGdncmFwaCkKbGlicmFyeSh3b3JkY2xvdWQyKQpsaWJyYXJ5KG1hZ3JpdHRyKQpsaWJyYXJ5KGlncmFwaCkKbGlicmFyeShtYWdyaXR0cikKbGlicmFyeShzY2FsZXMpCmBgYAoKIyMgU2VsZWN0aW5nIGEgdGV4dCB0byBhbmFseXplIGZvciB0aGUgd29ya3Nob3AKCkxldCdzIGJlZ2luIGJ5IGRvd25sb2FkaW5nIHR3byBkb2N1bWVudHMgd3JpdHRlbiBieSBvdXIgZGVhciBEci4gUGF1bCBTbmVsZ3JvdmUuIFRoZSBmaXJzdCBvbmUgaXMgaGlzIGJvb2sgZW50aXRsZWQgWydEaXNjb3ZlcmllcyBvZiB0aGUgQ2Vuc3VzIG9mIE1hcmluZSBMaWZlOiBNYWtpbmcgT2NlYW4gTGlmZSBDb3VudCddKGh0dHA6Ly93d3cuY2FtYnJpZGdlLm9yZy9kb3dubG9hZF9maWxlLzE1MzY2MykgYW5kIHRoZSBzZWNvbmQgaXMgb25lIG9mIGhpcyBtb3N0IGNpdGVkIHBhcGVyICgqbiogPSA4ODYpIGFjY29yZGluZyB0byBnb29nbGUgc2Nob2xhciwgW0dldHRpbmcgdG8gdGhlIEJvdHRvbSBvZiBNYXJpbmUgQmlvZGl2ZXJzaXR5OiBTZWRpbWVudGFyeSBIYWJpdGF0czogT2NlYW4gYm90dG9tcyBhcmUgdGhlIG1vc3Qgd2lkZXNwcmVhZCBoYWJpdGF0IG9uIEVhcnRoIGFuZCBzdXBwb3J0IGhpZ2ggYmlvZGl2ZXJzaXR5IGFuZCBrZXkgZWNvc3lzdGVtIHNlcnZpY2VzXShodHRwczovL291cC5zaWx2ZXJjaGFpci1jZG4uY29tL291cC9iYWNrZmlsZS9Db250ZW50X3B1YmxpYy9Kb3VybmFsL2Jpb3NjaWVuY2UvNDkvMi8xMC4yMzA3LzEzMTM1MzgvMi80OS0yLTEyOS5wZGY/RXhwaXJlcz0xNDkzNzQwMDAzJlNpZ25hdHVyZT1KcDBhVFNlYS1tQ25lM3ZneTZ0RTJKWXo1OGwtSzRpV1BGM2duZDhyLUdhZ3B5d2dVcThVOVdjSW1Lc1NhfmJEWk9qNW1ZMzIxNnhOb3FTRFdYR0xCZHVtSS1XSVJ2VXJIRktqMUFldG9TLVJzeXVwME5WTzluV0UwdGU4ZHNJWURXRUtrVVh2dTctOXhkSG5tcGE1UVNOVWxFOGtNOFZ+NEI2OG1kSlI3VzBlRS1hdH5HSDRwN0lyUFJEZUFOOW44VX5JMktkLXN+S2tZZ0FWNkFTeHZ6WGhLNHNMVWFqYX54czNuNWVYZGhUcmNhSklORk94azJ+Mm5VMTdqRGdYTTZQVXczVzVFcHZkeXd6NHM2fmVVNEl5V0NzdDRzb2taQVYzT2N6TzdOaWFnWWRLcTNmb1RpflktRVB+YzB1dlByMWdSRm94VDZpdEZOR1NTWkNrNkFfXyZLZXktUGFpci1JZD1BUEtBSVVDWkJJQTRMVlBBVlczUS8pLgoKYGBge3IsIGxvYWQgZmlsZSwgZXZhbCA9IEZBTFNFfQogICAgIyBEb3dubG9hZCBmaWxlIGZyb20gdGhlIHdlYgogICAgICAgIGRvd25sb2FkLmZpbGUoJ2h0dHA6Ly93d3cuY2FtYnJpZGdlLm9yZy9kb3dubG9hZF9maWxlLzE1MzY2MycsJ1NuZWxncm92ZV9UZXh0X09ubHkucGRmJywgbW9kZSA9ICd3YicpCgogICAgICAgIGRvd25sb2FkLmZpbGUoJ2h0dHBzOi8vb3VwLnNpbHZlcmNoYWlyLWNkbi5jb20vb3VwL2JhY2tmaWxlL0NvbnRlbnRfcHVibGljL0pvdXJuYWwvYmlvc2NpZW5jZS80OS8yLzEwLjIzMDcvMTMxMzUzOC8yLzQ5LTItMTI5LnBkZj9FeHBpcmVzPTE0OTM3NDAwMDMmU2lnbmF0dXJlPUpwMGFUU2VhLW1DbmUzdmd5NnRFMkpZejU4bC1LNGlXUEYzZ25kOHItR2FncHl3Z1VxOFU5V2NJbUtzU2F+YkRaT2o1bVkzMjE2eE5vcVNEV1hHTEJkdW1JLVdJUnZVckhGS2oxQWV0b1MtUnN5dXAwTlZPOW5XRTB0ZThkc0lZRFdFS2tVWHZ1Ny05eGRIbm1wYTVRU05VbEU4a004Vn40QjY4bWRKUjdXMGVFLWF0fkdINHA3SXJQUkRlQU45bjhVfkkyS2Qtc35La1lnQVY2QVN4dnpYaEs0c0xVYWphfnhzM241ZVhkaFRyY2FKSU5GT3hrMn4yblUxN2pEZ1hNNlBVdzNXNUVwdmR5d3o0czZ+ZVU0SXlXQ3N0NHNva1pBVjNPY3pPN05pYWdZZEtxM2ZvVGl+WS1FUH5jMHV2UHIxZ1JGb3hUNml0Rk5HU1NaQ2s2QV9fJktleS1QYWlyLUlkPUFQS0FJVUNaQklBNExWUEFWVzNRLycsICc0OS0yLTEyOS5wZGYnLCBtb2RlID0gJ3dkJykKCmBgYAoKIyMgSW1wb3J0aW5nIHRleHQgZnJvbSBwZGZzIGluIFIKClRoZSBwYWNrYWdlIFtgcGRmdG9vbHNgXShodHRwczovL3JvcGVuc2NpLm9yZy9ibG9nL2Jsb2cvMjAxNi8wMy8wMS9wZGZ0b29scy1hbmQtamVyb2VuKSBhbGxvd3MgeW91IHRvIHJlYWQgdGhlIGNvbnRlbnQgb2YgYSBwZGYgcmF0aGVyIGVhc2lseSBhbmQgaXMgYXZhaWxhYmxlIG9uIFJPcGVuU2NpLiBTZWUgcGFja2FnZSBkZXRhaWxzIGZvciBtb3JlIGZ1bmN0aW9ucwoKYGBge3IsIHJlYWQgcGRmfQogICAgIyBSZWFkIHBkZiBmaWxlCiAgICAgICAgYm9vayA8LSBwZGZfdGV4dCgiU25lbGdyb3ZlX1RleHRfT25seS5wZGYiKQogICAgICAgIHBhcGVyIDwtIHBkZl90ZXh0KCI0OS0yLTEyOS5wZGYiKQoKICAgICAgICBib29rWzFdCiAgICAgICAgbGVuZ3RoKGJvb2spCgogICAgICAgIHBhcGVyWzFdCiAgICAgICAgbGVuZ3RoKHBhcGVyKQpgYGAKCjxici8+CgpBcyB5b3UgY2FuIHNlZSwgdGhlIHJlc3VsdGluZyBib29rIGlzIGNvZXJjZWQgYXMgYSBwYWdlIHBlciBzdHJpbmcgZm9yIGEgdG90YWwgb2YgMzk4IHBhZ2VzLCB3aGlsZSB0aGVyZSBhcmUgMTAgcGFnZXMgZm9yIHRoZSBwYXBlciwgd2l0aCBsaW5lcyBkaXZpZGVkIGJ5ICdcXG4nLiBJZiB5b3UgbG9vayBhdCB0aGUgYWN0dWFsIHBkZiwgeW91IHdpbGwgYWxzbyByZWFsaXplIHRoYXQgdGFibGVzIGFyZSBub3QgaW1wb3J0ZWQgaW4gUiB1c2luZyB0aGUgYHBkZl90ZXh0YCBmdW5jdGlvbi4gVGhpcyBjb3VsZCBiZSBoaWdobHkgdXNlZnVsIHRvIGdldCBkYXRhIGVtYmVkZGVkIGluIHBkZiBmb3JtYXQuIElmIHlvdSB3aXNoIGRvIHNvLCB5b3UgY2FuIHRha2UgYSBsb29rIGF0IHRoZSBwYWNrYWdlIFtgdGFidWxpemVyYF0oaHR0cHM6Ly9yb3BlbnNjaS5vcmcvYmxvZy9ibG9nLzIwMTcvMDQvMTgvdGFidWxpemVyKSwgd2hpY2ggd2Ugd2lsbCBub3QgY292ZXIgaW4gdGhpcyB3b3Jrc2hvcC4KCk5vdyBsZXQncyB0aWR5IHVwIHRoZSB0ZXh0IHRvIG1ha2UgaXQgbW9yZSBlYXNpbHkgdXNhYmxlIGZvciBmdXJ0aGVyIGFuYWx5c2VzLgoKIyMgVGlkeSB1cCB0ZXh0CgpgYGB7ciwgdGlkeSB0ZXh0fQogICAgIyBEaXZpZGUgc3RyaW5ncyBwZXIgbGluZSB1c2luZyAnXG4gYXMgYSBzZXBhcmF0b3InCiAgICAgICAgYm9vayA8LSBzdHJfc3BsaXQoYm9vaywgJ1xuJykKICAgICAgICBwYXBlciA8LSBzdHJfc3BsaXQocGFwZXIsICdcbicpCiAgICAgICAgYm9va1tbMV1dWzE6MTBdCgogICAgIyBUcmltIHdoaXRlc3BhY2VzIGF0IHRoZSBiZWdpbm5pbmcgYW5kIGVuZCBvZiBsaW5lcwogICAgICAgIGJvb2sgPC0gbGFwcGx5KFggPSBib29rLCBGVU4gPSBzdHJfdHJpbSwgc2lkZSA9ICdib3RoJykKICAgICAgICBwYXBlciA8LSBsYXBwbHkoWCA9IHBhcGVyLCBGVU4gPSBzdHJfdHJpbSwgc2lkZSA9ICdib3RoJykKICAgICAgICBib29rW1sxXV1bMToxMF0KCiAgICAjIFRyYW5zZm9ybSBhcyBhIG1hdHJpeAogICAgICAgIGJvb2tNYXQgPC0gbWF0cml4KG5yb3cgPSAwLCBuY29sID0gMywgZGltbmFtZXMgPSBsaXN0KGMoKSwgYygndGV4dCcsJ3BhZ2UnLCdkb2N1bWVudCcpKSkKICAgICAgICBmb3IoaSBpbiAxOmxlbmd0aChib29rKSkgewogICAgICAgICAgICBiayA8LSBjYmluZChib29rW1tpXV0sIHJlcChpLCBsZW5ndGgoYm9va1tbaV1dKSksICdEaXNjb3ZlcmllcyBvZiB0aGUgQ2Vuc3VzIG9mIE1hcmluZSBMaWZlJykKICAgICAgICAgICAgYm9va01hdCA8LSByYmluZChib29rTWF0LCBiaykKICAgICAgICB9CgogICAgICAgIHBhcGVyTWF0IDwtIG1hdHJpeChucm93ID0gMCwgbmNvbCA9IDMsIGRpbW5hbWVzID0gbGlzdChjKCksIGMoJ3RleHQnLCdwYWdlJywnZG9jdW1lbnQnKSkpCiAgICAgICAgZm9yKGkgaW4gMTpsZW5ndGgocGFwZXIpKSB7CiAgICAgICAgICAgIGJrIDwtIGNiaW5kKHBhcGVyW1tpXV0sIHJlcChpLCBsZW5ndGgocGFwZXJbW2ldXSkpLCAnR2V0dGluZyB0byB0aGUgQm90dG9tIG9mIE1hcmluZSBCaW9kaXZlcnNpdHknKQogICAgICAgICAgICBwYXBlck1hdCA8LSByYmluZChwYXBlck1hdCwgYmspCiAgICAgICAgfQoKICAgICAgICBrYWJsZShib29rTWF0WzE6MTAsIF0pCgogICAgIyBSZW1vdmUgZW1wdHkgc3RyaW5ncwogICAgICAgIGJvb2tNYXRbYm9va01hdFssJ3RleHQnXSA9PSAnJywgJ3RleHQnXSA8LSBOQQogICAgICAgIGJvb2tNYXQgPC0gbmEub21pdChib29rTWF0KQoKICAgICAgICBwYXBlck1hdFtwYXBlck1hdFssJ3RleHQnXSA9PSAnJywgJ3RleHQnXSA8LSBOQQogICAgICAgIHBhcGVyTWF0IDwtIG5hLm9taXQocGFwZXJNYXQpCgogICAgIyBDb252ZXJ0IHRvIGRhdGEuZnJhbWUKICAgICAgICBib29rTWF0IDwtIGFzLmRhdGEuZnJhbWUoYm9va01hdCkKICAgICAgICBib29rTWF0WywgJ3RleHQnXSA8LSBhcy5jaGFyYWN0ZXIoYm9va01hdFssICd0ZXh0J10pCiAgICAgICAgYm9va01hdFssICdwYWdlJ10gPC0gYXMubnVtZXJpYyhwYXN0ZShib29rTWF0WywgJ3BhZ2UnXSkpCiAgICAgICAgYm9va01hdFssICdkb2N1bWVudCddIDwtIGFzLmNoYXJhY3RlcihwYXN0ZShib29rTWF0WywgJ2RvY3VtZW50J10pKQogICAgICAgIGthYmxlKGJvb2tNYXRbMToxMCwgXSkKCiAgICAgICAgcGFwZXJNYXQgPC0gYXMuZGF0YS5mcmFtZShwYXBlck1hdCkKICAgICAgICBwYXBlck1hdFssICd0ZXh0J10gPC0gYXMuY2hhcmFjdGVyKHBhcGVyTWF0WywgJ3RleHQnXSkKICAgICAgICBwYXBlck1hdFssICdwYWdlJ10gPC0gYXMubnVtZXJpYyhwYXN0ZShwYXBlck1hdFssICdwYWdlJ10pKQogICAgICAgIHBhcGVyTWF0WywgJ2RvY3VtZW50J10gPC0gYXMuY2hhcmFjdGVyKHBhc3RlKHBhcGVyTWF0WywgJ2RvY3VtZW50J10pKQogICAgICAgIGthYmxlKHBhcGVyTWF0WzE6MTAsIF0pCgogICAgIyBiaW5kIGFzIGEgc2luZ2xlIGRhdGEuZnJhbXQKICAgICAgICBwYXVsIDwtIHJiaW5kKGJvb2tNYXQsIHBhcGVyTWF0KQpgYGAKCjxici8+CgpUaGUgcmVzdWx0aW5nIGRhdGEgZm9yIHRoZSBwYXBlciBhcmUgbm90IHBlcmZlY3QgKCppLmUuKiB0cnVuY2F0ZWQgd29yZHMgYXJlIG5vdCByZWdyb3VwZWQpLCBidXQgdGhleSB3aWxsIHNlcnZlIGZvciBvdXIgcHVycG9zZXMhCgojIyBUZXh0IGFuYWx5c2lzCgo8YnIvPgoKTm93IHRoYXQgd2UgaGF2ZSBvdXIgZG9jdW1lbnQgYXMgYSBSIGRhdGEgZnJhbWUsIHdlIGNhbiBzdGFydCBwbGF5aW5nIGFyb3VuZCB3aXRoIGl0LiBUaGVyZSBhcmUgbXVsdGlwbGUgcGFja2FnZXMgdGhhdCBhbGxvdyB5b3UgcGVyZm9ybSB0ZXh0IGFuYWx5c2VzLiBXZSB3aWxsIGZvY3VzIG9uIHBhY2thZ2UgYHRpZHl0ZXh0YCBhcyBpdCBhbGlnbnMgd2l0aCB0aGUgdGlkeXZlcnNlLCBidXQgdGhlIHBhY2thZ2UgYHRtYCBpcyBhbm90aGVyIGltcG9ydGFudCBwYWNrYWdlIHVzZWQgdG8gcGVyZm9ybSB0ZXh0IGFuYWx5c2lzIHVuZGVyIFIuIEVhY2ggcGFja2FnZSB1c2VzIGNlcnRhaW4gY2xhc3NlcyBvZiBvYmplY3QsIGJ1dCBjbGFzc2VzIGNhbiBiZSBtb2RpZmllZCBxdWl0ZSBlYXNpbHkgZm9yIHVzZSBiZXR3ZWVuIHRoZSBwYWNrYWdlcy4gU2VlIHRoaXMgW3ZpZ25ldHRlXShodHRwOi8vMTI3LjAuMC4xOjE1NjQxL2xpYnJhcnkvdGlkeXRleHQvZG9jL3RpZHlpbmdfY2FzdGluZy5odG1sKSBmb3IgYSBkZXNjcmlwdGlvbiBvZiBzdGVwcyBhbmQgZnVuY3Rpb25zIHVzZWQgdG8gYWNoaWV2ZSB0aGlzLgoKV2Ugd2lsbCBmaXJzdCB0cmFuc2Zvcm0gdGhlIGJvb2sgYXMgYSBbdGliYmxlIGNsYXNzIG9iamVjdF0oaHR0cHM6Ly93d3cucmRvY3VtZW50YXRpb24ub3JnL3BhY2thZ2VzL3RpYmJsZS92ZXJzaW9ucy8xLjIpIGZvciB1c2UgaW4gYHRpZHl0ZXh0YAoKYGBge3IsIHRpZHl0ZXh0fQogICAgIyBDb252ZXJ0IHRvIHRpYmJsZSBjbGFzcyBvYmplY3QgZm9yIHVzZSBpbiB0aWR5dGV4dAogICAgICAgIHBhdWwgPC0gYXNfdGliYmxlKHBhdWwpCiAgICAgICAgcGF1bApgYGAKCjxici8+CgojIyMgR3JvdXAgeW91ciB0ZXh0IGJ5IGF0dHJpYnV0ZXMKCjxici8+CgpUaGUgYm9vayBjYW4gZWFzaWx5IGJlIGdyb3VwZWQgYnkgY2VydGFpbiBjcml0ZXJpb24uIEZvciBleGVtcGxlLCB3ZSB3aWxsIHN0YXJ0IGJ5IGdyb3VwaW5nIHRoZSBkYXRhIGFjY29yZGluZyB0byB3aGljaCBwYXJ0IG9mIHRoZSBib29rIHRoZXkgYmVsb25nLiBUaGVyZSBhcmUgYSB0b3RhbCBvZiB0aHJlZSBwYXJ0cyB0byB0aGlzIGJvb2suIFVuZm9ydHVuYXRlbHksIHRoaXMgcGFydGljdWxhciBwcm9jZXNzIGRvZXMgbm90IHdvcmsgZm9yIG91ciBwZGYgZG9jdW1lbnQsIGJ1dCB0aGUgY29kZSBkb2VzIHdvcmsgd2UgbWlnaHQgdXNlIGl0IGFnYWluIGVsc2V3aGVyZSwgbGV0J3Mgc2F5IGZvciBhbiBleGVyY2ljZSAoKmNvdWdoIGNvdWdoKikuCgpgYGB7ciwgZ3JvdXAgY2hhcHRlcnN9CiAgICBwYXVsRG9jIDwtIHBhdWwgJT4lCiAgICAgICAgICAgICAgICAgICAgbXV0YXRlKGxpbmVudW1iZXIgPSByb3dfbnVtYmVyKCksCiAgICAgICAgICAgICAgICAgICAgICAgICAgIGNoYXB0ZXIgID0gY3Vtc3VtKHN0cl9kZXRlY3QodGV4dCwgcmVnZXgoIl5wYXJ0IFtcXGRpdnhsY10iLCBpZ25vcmVfY2FzZSA9IFRSVUUpKSkpICU+JQogICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKQoKICAgIHBhdWxEb2MKICAgIHVuaXF1ZShwYXVsRG9jWywgJ2NoYXB0ZXInXSkKYGBgCgo8YnIvPgoKVGhpcyBmdW5jdGlvbiBpcyBub3QgdXNlZnVsIHdpdGggdGhlc2UgZG9jdW1lbnRzIGFzIHRoZXkgYXJlIG5vdCBkaXZpZGVkIGJ5IGNoYXB0ZXJzIGFuZCB0aGUgZnVuY3Rpb24gZG9lcyBub3QgZGlmZmVyZW50aWF0ZSBiZXR3ZWVuIGRvY3VtZW50cywgeWV0IHRoZSByZWdleCBjb2RlIGNhbiBzdGlsbCBiZSB1c2VmdWwgdG8gZGl2aWRlIHlvdXIgdGV4dCBob3dldmVyIHlvdSBtYXkgc2VlIGZpdC4KCjxici8+CgojIyMgVGVybSBmcmVxdWVuY3kKCkEgZnJlcXVlbnQgYW5hbHlzaXMgcGVyZm9ybWVkIHdpdGggdGV4dCBpcyB0byBldmFsdWF0ZSB0aGUgZnJlcXVlbmN5IG9mIHdvcmRzIHVzZWQgaW4gdGhlIHRleHQuIFRoaXMgY2FuIGJlIGRvbmUgYnkgZmlyc3QgdW5uZXN0aW5nIHRoZSBpbmRpdmlkdWFsIHdvcmRzIGZyb20gdGhlIHRleHQgd2l0aCB0aGUgYHVubmVzdF90b2tlbnNgIGZ1bmN0aW9uIGFuZCB1c2luZyB0aGUgYGNvdW50YCBmdW5jdGlvbiB0byBjb3VudCB0aGUgbnVtYmVyIG9mIHdvcmRzLgoKYGBge3IsIHdvcmRzfQogICAgIyBDb3VudCBwZXIgd29yZAogICAgICAgIHBhdWxXb3JkIDwtIHBhdWwgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lICMgZGl2aWRlIGJ5IHdvcmQKICAgICAgICAgICAgICAgICAgICAgICAgY291bnQod29yZCwgc29ydCA9IFRSVUUpICMgY291bnRzIHRoZSB3b3JkIGZyZXF1ZW5jeQogICAgICAgIHBhdWxXb3JkCgogICAgIyBUb3RhbCBudW1iZXIgb2Ygd29yZHMKICAgICAgICB0b3RhbFdvcmRzIDwtIHBhdWxXb3JkICU+JSBzdW1tYXJpemUodG90YWwgPSBzdW0obikpCiAgICAgICAgdG90YWxXb3JkcwpgYGAKCjxici8+CgpZb3UgY291bGQgYWxzbyBhcHBseSB0aGlzIGRpcmVjdGx5IHRvIHRoZSBkYXRhIGdyb3VwZWQgYnkgY2hhcHRlcnMgdG8gb2J0YWluIGEgd29yZCBjb3VudCBwZXIgY2hhcHRlcnMuCgpgYGB7ciwgd29yZHMgY2hhcHRlcn0KICAgICMgQ291bnQgcGVyIHdvcmQgcGVyIGNoYXB0ZXIKICAgICAgICBwYXVsV29yZERvY3MgPC0gcGF1bCAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIG11dGF0ZShsaW5lbnVtYmVyID0gcm93X251bWJlcigpKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgICAgICBjb3VudChkb2N1bWVudCwgd29yZCwgc29ydCA9IFRSVUUpICU+JQogICAgICAgICAgICAgICAgICAgICAgICAgICAgdW5ncm91cCgpCgogICAgIyBBdHRhY2ggdG90YWwgbnVtYmVyIG9mIHdvcmRzIHBlciBjaGFwdGVyICAgICAgICAgICAgICAgICAgICAgICAgICAgIAogICAgICAgIHRvdGFsV29yZHMgPC0gcGF1bFdvcmREb2NzICU+JSBncm91cF9ieShkb2N1bWVudCkgJT4lIHN1bW1hcml6ZSh0b3RhbCA9IHN1bShuKSkKICAgICAgICB0b3RhbFdvcmRzCgogICAgICAgIHBhdWxXb3JkRG9jcyA8LSBsZWZ0X2pvaW4ocGF1bFdvcmREb2NzLCB0b3RhbFdvcmRzKQogICAgICAgIHBhdWxXb3JkRG9jcwoKICAgICAgICBnZ3Bsb3QocGF1bFdvcmREb2NzLCBhZXMobi90b3RhbCwgZmlsbCA9IGRvY3VtZW50KSkgKwogICAgICAgICAgZ2VvbV9oaXN0b2dyYW0oc2hvdy5sZWdlbmQgPSBGQUxTRSkgKwogICAgICAgICAgeGxpbShOQSwgMC4wMDA5KSArCiAgICAgICAgICBmYWNldF93cmFwKH5kb2N1bWVudCwgbmNvbCA9IDIsIHNjYWxlcyA9ICJmcmVlX3kiKQpgYGAKPGJyLz4KCldvcmRzIGxpa2UgJ3RoZScsICdhbmQnLCAnb2YnIGFyZSB0aGUgbW9zdCBjb21tb24gYnkgZmFyLiBUaG9zZSBhcmUgcmVmZXJyZWQgdG8gYXMgKnN0b3Agd29yZHMqIGFuZCBhcmUgdXN1YWxseSByZW1vdmVkIGluIHRleHQgYW5hbHlzaXMuIEEgbGlzdCBvZiBzdWNoIHdvcmRzIGlzIGF2YWlsYWJsZSBpbiB0aGUgYHN0b3Bfd29yZHNgIGRhdGFzZXQgYW5kIGNhbiBiZSByZW1vdmVkIGZyb20geW91ciBkYXRhc2V0IHVzaW5nIGZ1bmN0aW9uIGBhbnRpX2pvaW5gLgoKYGBge3IsIHN0b3Agd29yZHN9CiAgICAjIENvdW50IHBlciB3b3JkIGFmdGVyIHN0b3Agd29yZHMgcmVtb3ZhbAogICAgICAgIGRhdGEoInN0b3Bfd29yZHMiKQogICAgICAgIHBhdWxXb3JkRG9jcyA8LSBwYXVsICU+JQogICAgICAgICAgICAgICAgICAgICAgICAgICAgbXV0YXRlKGxpbmVudW1iZXIgPSByb3dfbnVtYmVyKCkpICU+JQogICAgICAgICAgICAgICAgICAgICAgICAgICAgdW5ncm91cCgpICU+JQogICAgICAgICAgICAgICAgICAgICAgICAgICAgdW5uZXN0X3Rva2Vucyh3b3JkLCB0ZXh0KSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIGFudGlfam9pbihzdG9wX3dvcmRzKSAlPiUgIyByZW1vdmUgc3RvcCB3b3JkcwogICAgICAgICAgICAgICAgICAgICAgICAgICAgY291bnQoZG9jdW1lbnQsIHdvcmQsIHNvcnQgPSBUUlVFKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKQoKICAgICAgICB0b3RhbFdvcmRzIDwtIHBhdWxXb3JkRG9jcyAlPiUgZ3JvdXBfYnkoZG9jdW1lbnQpICU+JSBzdW1tYXJpemUodG90YWwgPSBzdW0obikpCiAgICAgICAgcGF1bFdvcmREb2NzIDwtIGxlZnRfam9pbihwYXVsV29yZERvY3MsIHRvdGFsV29yZHMpCiAgICAgICAgcGF1bFdvcmREb2NzCgogICAgICAgIGdncGxvdChwYXVsV29yZERvY3MsIGFlcyhuL3RvdGFsLCBmaWxsID0gZG9jdW1lbnQpKSArCiAgICAgICAgICBnZW9tX2hpc3RvZ3JhbShzaG93LmxlZ2VuZCA9IEZBTFNFKSArCiAgICAgICAgICB4bGltKE5BLCAwLjAwMDkpICsKICAgICAgICAgIGZhY2V0X3dyYXAofmRvY3VtZW50LCBuY29sID0gMiwgc2NhbGVzID0gImZyZWVfeSIpCgpgYGAKCiMjIyBXb3JkIGZyZXF1ZW5jeSBjb21wYXJpc29uCgpUaGUgZnJlcXVlbmN5IG9mIHdvcmRzIGNvdWxkIGFsc28gYmUgY29tcGFyZWQgYmV0d2VlbiBkb2N1bWVudHMuCgpgYGB7ciwgd29yZCBmcmVxdWVuY3l9CiAgICBwYXVsQm9vayA8LSBhc190aWJibGUoYm9va01hdCkKICAgIHBhdWxQYXBlciA8LSBhc190aWJibGUocGFwZXJNYXQpCgogICAgcGF1bEJvb2tXb3JkIDwtIHBhdWxCb29rICU+JQogICAgICAgICAgICAgICAgICAgICAgICBtdXRhdGUobGluZW51bWJlciA9IHJvd19udW1iZXIoKSkgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgdW5uZXN0X3Rva2Vucyh3b3JkLCB0ZXh0KSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgYW50aV9qb2luKHN0b3Bfd29yZHMpICU+JSAjIHJlbW92ZSBzdG9wIHdvcmRzCiAgICAgICAgICAgICAgICAgICAgICAgIGNvdW50KHdvcmQsIHNvcnQgPSBUUlVFKSAlPiUKCiAgICAgICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKQoKICAgIHBhdWxQYXBlcldvcmQgPC0gcGF1bFBhcGVyICU+JQogICAgICAgICAgICAgICAgICAgICAgICBtdXRhdGUobGluZW51bWJlciA9IHJvd19udW1iZXIoKSkgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgdW5uZXN0X3Rva2Vucyh3b3JkLCB0ZXh0KSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgYW50aV9qb2luKHN0b3Bfd29yZHMpICU+JSAjIHJlbW92ZSBzdG9wIHdvcmRzCiAgICAgICAgICAgICAgICAgICAgICAgIGNvdW50KHdvcmQsIHNvcnQgPSBUUlVFKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgdW5ncm91cCgpCgogICAgZnJlcXVlbmN5IDwtIHBhdWxCb29rV29yZCAlPiUKICAgICAgICAgICAgICAgICAgICByZW5hbWUoQm9vayA9IG4pICU+JQogICAgICAgICAgICAgICAgICAgIGlubmVyX2pvaW4ocGF1bFBhcGVyV29yZCkgJT4lCiAgICAgICAgICAgICAgICAgICAgcmVuYW1lKFBhcGVyID0gbikgJT4lCiAgICAgICAgICAgICAgICAgICAgbXV0YXRlKEJvb2sgPSBCb29rIC8gc3VtKEJvb2spLAogICAgICAgICAgICAgICAgICAgICAgICAgICBQYXBlciA9IFBhcGVyIC8gc3VtKFBhcGVyKSkgJT4lCiAgICAgICAgICAgICAgICAgICAgdW5ncm91cCgpCgogICAgZ2dwbG90KGZyZXF1ZW5jeSwgYWVzKHggPSBCb29rLCB5ID0gUGFwZXIsIGNvbG9yID0gYWJzKFBhcGVyIC0gQm9vaykpKSArCiAgICAgICAgICAgIGdlb21fYWJsaW5lKGNvbG9yID0gImdyYXk0MCIpICsKICAgICAgICAgICAgZ2VvbV9qaXR0ZXIoYWxwaGEgPSAwLjEsIHNpemUgPSAyLjUsIHdpZHRoID0gMC40LCBoZWlnaHQgPSAwLjQpICsKICAgICAgICAgICAgZ2VvbV90ZXh0KGFlcyhsYWJlbCA9IHdvcmQpLCBjaGVja19vdmVybGFwID0gVFJVRSwgdmp1c3QgPSAxLjUpICsKICAgICAgICAgICAgc2NhbGVfeF9sb2cxMChsYWJlbHMgPSBwZXJjZW50X2Zvcm1hdCgpKSArCiAgICAgICAgICAgIHNjYWxlX3lfbG9nMTAobGFiZWxzID0gcGVyY2VudF9mb3JtYXQoKSkgKwogICAgICAgICAgICBzY2FsZV9jb2xvcl9ncmFkaWVudChsaW1pdHMgPSBjKDAsIDAuMDAxKSwgbG93ID0gImRhcmtzbGF0ZWdyYXk0IiwgaGlnaCA9ICJncmF5NzUiKSArCiAgICAgICAgICAgIHRoZW1lX21pbmltYWwoYmFzZV9zaXplID0gMTQpICsKICAgICAgICAgICAgdGhlbWUobGVnZW5kLnBvc2l0aW9uPSJub25lIikgKwogICAgICAgICAgICBsYWJzKHRpdGxlID0gIkNvbXBhcmluZyBXb3JkIEZyZXF1ZW5jaWVzIiwKICAgICAgICAgICAgICAgICBzdWJ0aXRsZSA9ICJXb3JkIGZyZXF1ZW5jaWVzIGluIFBhdWwgU25lbGdyb3ZlcydzIGJvb2sgYW5kIHBhcGVyIiwKICAgICAgICAgICAgICAgICB5ID0gIkdldHRpbmcgdG8gdGhlIEJvdHRvbSBvZiBNYXJpbmUgQmlvZGl2ZXJzaXR5IiwgeCA9ICJEaXNjb3ZlcmllcyBvZiB0aGUgQ2Vuc3VzIG9mIE1hcmluZSBMaWZlIikKYGBgCgojIyMgU2VudGltZW50IGFuYWx5c2lzCgpgdGlkeXRleHRgIGFsc28gZ2l2ZXMgdGhlIG9wcG9ydHVuaXR5IHRvIHBlcmZvcm0gYSBjdXJzb3J5IHNlbnRpbWVudCBhbmFseXNpcywgKmkuZS4qIGV2YWx1YXRpbmcgd2hldGhlciB0aGUgdGV4dCBpcyBtb3JlIG9yIGxlc3MgbmVnYXRpdmUsIHVzaW5nIHRoZSBgc2VudGltZW50YCBkYXRhc2V0LiBXaGlsZSBpdCBtYXkgbm90IGJlIGFzIHVzZWZ1bCB0byBxdWFsaWZ5IHBvc2l0aXZlIG9yIG5lZ2F0aXZlIHNjaWVuY2UsIGl0IG1heSByZXZlYWwgc29tZSBpbnNpZ2h0cyBhcyB0byB0aGUgb3ZlcmFsbCBzdHlsZSBvZiB3cml0aW5nIG9mIHRoZSBhdXRob3IgKHdlIGFyZSBsb29raW5nIGF0IHlvdSBEci4gU25lbGdyb3ZlKS4KCmBgYHtyLCBzZW50aW1lbnR9CiAgICAjIEdhdGhlciBsaXN0IG9mIHNlbnRpbWVudHMgZnJvbSB0aWR5dGV4dAogICAgICAgIGJpbmcgPC0gc2VudGltZW50cyAlPiUKICAgICAgICAgICAgICAgIGZpbHRlcihsZXhpY29uID09ICJiaW5nIikgJT4lCiAgICAgICAgICAgICAgICBkcGx5cjo6c2VsZWN0KC1zY29yZSkKICAgICAgICBiaW5nCgogICAgICAgIHBhdWxTZW50aW1lbnQgPC0gcGF1bCAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIG11dGF0ZShsaW5lbnVtYmVyID0gcm93X251bWJlcigpKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgICAgICBhbnRpX2pvaW4oc3RvcF93b3JkcykgJT4lICMgcmVtb3ZlIHN0b3Agd29yZHMKICAgICAgICAgICAgICAgICAgICAgICAgICAgIGlubmVyX2pvaW4oYmluZykgJT4lICMgam9pbiB3aXRoIHNlbnRpbWVudCBkYXRhc2V0CiAgICAgICAgICAgICAgICAgICAgICAgICAgICBjb3VudChkb2N1bWVudCwgaW5kZXggPSBsaW5lbnVtYmVyICUvJSA4MCwgc2VudGltZW50KSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHNwcmVhZChzZW50aW1lbnQsIG4sIGZpbGwgPSAwKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgIG11dGF0ZShzZW50aW1lbnQgPSBwb3NpdGl2ZSAtIG5lZ2F0aXZlKQogICAgICAgIHBhdWxTZW50aW1lbnQKCiAgICAgICAgIyBwbG90CiAgICAgICAgICAgIGdncGxvdChwYXVsU2VudGltZW50LCBhZXMoaW5kZXgsIHNlbnRpbWVudCwgZmlsbCA9IGRvY3VtZW50KSkgKwogICAgICAgICAgICAgICAgICAgIGdlb21fYmFyKHN0YXQgPSAiaWRlbnRpdHkiLCBzaG93LmxlZ2VuZCA9IEZBTFNFKSArCiAgICAgICAgICAgICAgICAgICAgZmFjZXRfd3JhcCh+ZG9jdW1lbnQsIG5jb2wgPSAyLCBzY2FsZXMgPSAiZnJlZV94IikgKwogICAgICAgICAgICAgICAgICAgIHRoZW1lX21pbmltYWwoYmFzZV9zaXplID0gMTMpICsKICAgICAgICAgICAgICAgICAgICBsYWJzKHRpdGxlID0gIlNlbnRpbWVudCBpbiBQYXVsIFNuZWxncm92ZSdzIHdyaXRpbmciLAogICAgICAgICAgICAgICAgICAgICAgICAgeSA9ICJTZW50aW1lbnQiKSArCiAgICAgICAgICAgICAgICAgICAgc2NhbGVfZmlsbF92aXJpZGlzKGVuZCA9IDAuNzUsIGRpc2NyZXRlPVRSVUUsIGRpcmVjdGlvbiA9IC0xKSArCiAgICAgICAgICAgICAgICAgICAgc2NhbGVfeF9kaXNjcmV0ZShleHBhbmQ9YygwLjAyLDApKSArCiAgICAgICAgICAgICAgICAgICAgdGhlbWUoc3RyaXAudGV4dD1lbGVtZW50X3RleHQoaGp1c3Q9MCkpICsKICAgICAgICAgICAgICAgICAgICB0aGVtZShzdHJpcC50ZXh0ID0gZWxlbWVudF90ZXh0KGZhY2UgPSAiaXRhbGljIikpICsKICAgICAgICAgICAgICAgICAgICB0aGVtZShheGlzLnRpdGxlLng9ZWxlbWVudF9ibGFuaygpKSArCiAgICAgICAgICAgICAgICAgICAgdGhlbWUoYXhpcy50aWNrcy54PWVsZW1lbnRfYmxhbmsoKSkgKwogICAgICAgICAgICAgICAgICAgIHRoZW1lKGF4aXMudGV4dC54PWVsZW1lbnRfYmxhbmsoKSkKCiAgICAgICAgIyBNb3N0IGNvbW1vbiBwb3NpdGl2ZSBhbmQgbmVnYXRpdmUgd29yZHMKICAgICAgICBwYXVsU2VudGltZW50Q291bnQgPC0gcGF1bCAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBtdXRhdGUobGluZW51bWJlciA9IHJvd19udW1iZXIoKSkgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgdW5ncm91cCgpICU+JQogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVubmVzdF90b2tlbnMod29yZCwgdGV4dCkgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgYW50aV9qb2luKHN0b3Bfd29yZHMpICU+JSAjIHJlbW92ZSBzdG9wIHdvcmRzCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgaW5uZXJfam9pbihiaW5nKSAlPiUgIyBqb2luIHdpdGggc2VudGltZW50IGRhdGFzZXQKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjb3VudChkb2N1bWVudCwgd29yZCwgc2VudGltZW50LCBzb3J0ID0gVFJVRSkgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgdW5ncm91cCgpCgoKICAgICAgICAjIENvbnRyaWJ1dGlvbiB0byBzZW50aW1lbnQKICAgICAgICAgICAgcGF1bFNlbnRpbWVudENvdW50ICU+JQogICAgICAgICAgICAgICAgICBmaWx0ZXIobiA+IDEwKSAlPiUKICAgICAgICAgICAgICAgICAgbXV0YXRlKG4gPSBpZmVsc2Uoc2VudGltZW50ID09ICJuZWdhdGl2ZSIsIC1uLCBuKSkgJT4lCiAgICAgICAgICAgICAgICAgIG11dGF0ZSh3b3JkID0gcmVvcmRlcih3b3JkLCBuKSkgJT4lCiAgICAgICAgICAgICAgICAgIGdncGxvdChhZXMod29yZCwgbiwgZmlsbCA9IHNlbnRpbWVudCkpICsKICAgICAgICAgICAgICAgICAgZ2VvbV9iYXIoc3RhdCA9ICJpZGVudGl0eSIpICsKICAgICAgICAgICAgICAgICAgdGhlbWUoYXhpcy50ZXh0LnggPSBlbGVtZW50X3RleHQoYW5nbGUgPSA5MCwgaGp1c3QgPSAxKSkgKwogICAgICAgICAgICAgICAgICB5bGFiKCJDb250cmlidXRpb24gdG8gc2VudGltZW50IikKYGBgCgojIyMgIlVuaXRzIEJleW9uZCBXb3JkcyIKCldvcmRzIGFyZSBub3QgdGhlIG9ubHkgdW5pdHMgb2YgdGV4dCB0aGF0IGNhbiBiZSBleHRyYWN0ZWQgdXNpbmcgdGhlIGB1bm5lc3RfdG9rZW5zYCBmdW5jdGlvbi4gTG9vayBhdCB0aGUgcGFja2FnZSB2aWduZXR0ZSBmb3IgbW9yZSBpbmZvcm1hdGlvbiEKCmBgYHtyLCBzZW50ZW5jZXN9CgpwYXVsU2VudGVuY2VzIDwtIHBhdWwgJT4lCiAgICAgICAgICAgICAgICAgICAgZ3JvdXBfYnkoZG9jdW1lbnQpICU+JQogICAgICAgICAgICAgICAgICAgIHVubmVzdF90b2tlbnMoc2VudGVuY2UsIHRleHQsIHRva2VuID0gInNlbnRlbmNlcyIpICU+JQogICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKQoKCnBhdWxTZW50ZW5jZXMkc2VudGVuY2VbMV0KYGBgCgojIyMgTmV0d29ya3Mgb2YgV29yZHMKCllvdSBjYW4gYWxzbyBhc29jaWF0ZSB3aGljaCB3b3JkcyBhcmUgdXNlZCBtb3JlIG9mdGVuIHRvZ2V0aGVyIHdpdGggdGhlIGZ1bmN0aW9uIGBwYWlyd2lzZV9jb3VudGAgZnJvbSB0aGUgcGFja2FnZSBgd2lkeXJgCgpgYGB7ciwgcGFpciBjb3VudCwgZXZhbCA9IEZBTFNFfQogICAgICAgIHBhdWxXb3JkIDwtIHBhdWwgJT4lCiAgICAgICAgICAgICAgICAgICAgICAgIG11dGF0ZShsaW5lbnVtYmVyID0gcm93X251bWJlcigpKSAlPiUKICAgICAgICAgICAgICAgICAgICAgICAgdW5ncm91cCgpICU+JQogICAgICAgICAgICAgICAgICAgICAgICB1bm5lc3RfdG9rZW5zKHdvcmQsIHRleHQpICU+JQogICAgICAgICAgICAgICAgICAgICAgICBhbnRpX2pvaW4oc3RvcF93b3JkcykKCiAgICAgICAgcGF1bFdvcmRPY2MgPC0gcGFpcndpc2VfY291bnQocGF1bFdvcmQsIHdvcmQsIGxpbmVudW1iZXIsIHNvcnQgPSBUUlVFKQoKICAgICAgICBzZXQuc2VlZCgxODEzKQogICAgICAgIHBhdWxXb3JkT2NjICU+JQogICAgICAgICAgICAgICAgZmlsdGVyKG4gPj0gMjUpICU+JQogICAgICAgICAgICAgICAgZ3JhcGhfZnJvbV9kYXRhX2ZyYW1lKCkgJT4lCiAgICAgICAgICAgICAgICBnZ3JhcGgobGF5b3V0ID0gImZyIikgKwogICAgICAgICAgICAgICAgZ2VvbV9lZGdlX2xpbmsoYWVzKGVkZ2VfYWxwaGEgPSBuLCBlZGdlX3dpZHRoID0gbikpICsKICAgICAgICAgICAgICAgIGdlb21fbm9kZV9wb2ludChjb2xvciA9ICJkYXJrc2xhdGVncmF5NCIsIHNpemUgPSA1KSArCiAgICAgICAgICAgICAgICBnZW9tX25vZGVfdGV4dChhZXMobGFiZWwgPSBuYW1lKSwgdmp1c3QgPSAxLjgpICsKICAgICAgICAgICAgICAgIGdndGl0bGUoZXhwcmVzc2lvbihwYXN0ZSgiV29yZCBOZXR3b3JrIGluIFBhdWwgU25lbGdyb3ZlJ3MgYm9vayAiLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGl0YWxpYygiQ2Vuc3VzIG9mIE1hcmluZSBMaWZlIikpKSkgKwogICAgICAgICAgICAgICAgdGhlbWVfdm9pZCgpCgoKICAgICAgICBuZXREMyA8LSBwYXVsV29yZE9jYyAlPiUKICAgICAgICAgICAgICAgIGZpbHRlcihuID49IDI1KSAlPiUKICAgICAgICAgICAgICAgIGdyYXBoX2Zyb21fZGF0YV9mcmFtZSgpCgogICAgICAgIG5ldEQzIDwtIG5ldHdvcmtEMzo6aWdyYXBoX3RvX25ldHdvcmtEMyhuZXREMywgZ3JvdXAgPSByZXAoMSwgdmNvdW50KG5ldEQzKSksIHdoYXQgPSAnYm90aCcpCgogICAgICAgIG5ldHdvcmtEMzo6Zm9yY2VOZXR3b3JrKExpbmtzID0gbmV0RDMkbGlua3MsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTm9kZXMgPSBuZXREMyRub2RlcywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBTb3VyY2UgPSAnc291cmNlJywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBUYXJnZXQgPSAndGFyZ2V0JywKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBOb2RlSUQgPSAnbmFtZScsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgR3JvdXAgPSAnZ3JvdXAnLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHpvb20gPSBUUlVFLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGxpbmtEaXN0YW5jZSA9IDUwLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIGZvbnRTaXplID0gMTIsCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgb3BhY2l0eSA9IDAuOSwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBjaGFyZ2UgPSAtMTApCmBgYAoKPGJyLz4KCiMjIyBXb3JkbGUKCkFub3RoZXIgbmVhdCB2aXN1YWwgdG9vbCBhdmFpbGFibGUgaW4gUiBpcyB0aGUgYWJpbGl0eSB0byBwcm9kdWNlIGN1c3RvbSB3b3JkbGUgYmFzZWQgb24gdGhlIHJlc3VsdHMgb2YgeW91ciBhbmFseXNlcyB1c2luZyB0aGUgcGFja2FnZSBgd29yZGNsb3VkMmAKCmBgYHtyLCB3b3JkY2xvdWR9CgogICAgICAgIHBhdWxXb3JkIDwtIHBhdWwgJT4lCiAgICAgICAgICAgICAgICAgICAgbXV0YXRlKGxpbmVudW1iZXIgPSByb3dfbnVtYmVyKCkpICU+JQogICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKSAlPiUKICAgICAgICAgICAgICAgICAgICB1bm5lc3RfdG9rZW5zKHdvcmQsIHRleHQpICU+JQogICAgICAgICAgICAgICAgICAgIGFudGlfam9pbihzdG9wX3dvcmRzKSAlPiUgIyByZW1vdmUgc3RvcCB3b3JkcwogICAgICAgICAgICAgICAgICAgIGNvdW50KGRvY3VtZW50LCB3b3JkLCBzb3J0ID0gVFJVRSkgJT4lICMgY291bnRzIHRoZSB3b3JkIGZyZXF1ZW5jeQogICAgICAgICAgICAgICAgICAgIHVuZ3JvdXAoKSAlPiUKICAgICAgICAgICAgICAgICAgICBmaWx0ZXIobiA+PSAyMCkKCiAgICAgICAgcGF1bFdvcmRERiA8LSBhcy5kYXRhLmZyYW1lKHBhdWxXb3JkWywgYygnd29yZCcsJ24nKV0pCgogICAgICAgICMgQmFzaWMgd29yZGxlCiAgICAgICAgICAgIHdvcmRjbG91ZDI6OndvcmRjbG91ZDIocGF1bFdvcmRERiwgc2l6ZSA9IDEsIGNvbG9yPSJyYW5kb20tbGlnaHQiLCBiYWNrZ3JvdW5kQ29sb3I9MSkKCgogICAgICAgICMgIyBXb3JkIHNoYXBlZCB3b3JkbGUKICAgICAgICAjICAgICB3b3JkY2xvdWQyOjpsZXR0ZXJDbG91ZChwYXVsV29yZERGLCB3b3JkID0gIlBhdWwiKQogICAgICAgICMKICAgICAgICAjICMgSW1hZ2Ugc2hhcGVkIHdvcmRsZQogICAgICAgICMgICAgIHdvcmRjbG91ZDI6OndvcmRjbG91ZDIocGF1bFdvcmRERiwgZmlnUGF0aCA9ICIuL0NvTUxfaWNvbi5wbmciLCBzaXplID0gMS41KQogICAgICAgICMKICAgICAgICAjICAgICB3b3JkY2xvdWQyOjp3b3JkY2xvdWQyKHBhdWxXb3JkREYsIGZpZ1BhdGggPSAiLi9DSE9OZS5qcGciLCBzaXplID0gMS41KQpgYGAKCgoKWzw8QkFDS10oaHR0cHM6Ly9yZW1pLWRhaWdsZS5naXRodWIuaW8vMjAxNy1DSE9OZS1EYXRhLykK