Solved – Use tm_filter to search for multiple words

I´m new to R, so please bear with me.

So, I know I can use the following to search for a word in several documents.

data("crude") tm_filter(crude, FUN = function(x) any(grep("company", content(x)))) 

How do I search for more? I´m thinking I can write a for-loop, but I´m guessing there is some smarter way..?

You can do this in the quanteda package easily using the kwic() function. For the crude corpus from tm, for instance, you can do the following:

devtools::install_github("kbenoit/quanteda", quiet = TRUE) require(quanteda) data(crude, package = "tm") mycorpus <- corpus(crude) kwic(mycorpus, "company") ##                                    contextPre keyword                     contextPost ##  [127, 67]                     oil market," a company spokeswoman said. Diamond is    ##  [194, 64]                    dlrs a bbl. The company last changed its crude postings ## [236, 345] challenge to any international oil company that declared Kuwait sold below ##  [543, 71]                 to 16.35 dlrs, the company said. No changes were           ##  [543, 91]                  of crude oil, the company said. Reuter  

Note that here I have also included the installation command to get the latest version of quanteda from GitHub, since when I answered this post, the version (0.8.5-7) required to convert a tm VCorpus object correctly was not yet on CRAN. (Apparently the structure of a VCorpus object has changed recently.)

Note also that the references in brackets are the document name (in crude, these are just numbers) and the token serial number where the keyword occurs in the text.

Similar Posts:

Rate this post

Leave a Comment