I´m new to R, so please bear with me.
So, I know I can use the following to search for a word in several documents.
data("crude") tm_filter(crude, FUN = function(x) any(grep("company", content(x))))
How do I search for more? I´m thinking I can write a for-loop, but I´m guessing there is some smarter way..?
Best Answer
You can do this in the quanteda package easily using the kwic()
function. For the crude corpus from tm, for instance, you can do the following:
devtools::install_github("kbenoit/quanteda", quiet = TRUE) require(quanteda) data(crude, package = "tm") mycorpus <- corpus(crude) kwic(mycorpus, "company") ## contextPre keyword contextPost ## [127, 67] oil market," a company spokeswoman said. Diamond is ## [194, 64] dlrs a bbl. The company last changed its crude postings ## [236, 345] challenge to any international oil company that declared Kuwait sold below ## [543, 71] to 16.35 dlrs, the company said. No changes were ## [543, 91] of crude oil, the company said. Reuter
Note that here I have also included the installation command to get the latest version of quanteda from GitHub, since when I answered this post, the version (0.8.5-7) required to convert a tm VCorpus
object correctly was not yet on CRAN. (Apparently the structure of a VCorpus
object has changed recently.)
Note also that the references in brackets are the document name (in crude
, these are just numbers) and the token serial number where the keyword occurs in the text.
Similar Posts:
- Solved – Cosine angle calculation for the documents – Dissimilarity function not working in tm package in R
- Solved – Cosine angle calculation for the documents – Dissimilarity function not working in tm package in R
- Solved – Fatal error using `RWeka::NGramTokenizer` with `tm` to build a term document matrix?
- Solved – R: Visualizing document clustering results
- Solved – a corpus in topic modeling