R-tokenizers
Port variant standard
Summary Fast tokenization of natural language text
Package version 0.3.0
Homepage https://docs.ropensci.org/tokenizers/
Keywords cran
Maintainer CRAN Automaton
License Not yet specified
Other variants There are no other variants.
Ravenports Buildsheet | History
Ravensource Port Directory | History
Last modified 25 APR 2023, 21:12:43 UTC
Port created 14 APR 2020, 06:14:40 UTC
Subpackage Descriptions
single tokenizers: Fast, Consistent Tokenization of Natural Language Text Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.
Configuration Switches (platform-specific settings discarded)
This port has no build options.
Package Dependencies by Type
Build (only) gmake:primary:standard
R:primary:standard
icu:dev:standard
Build and Runtime R-stringi:single:standard
R-Rcpp:single:standard
R-SnowballC:single:standard
Runtime (only) R:primary:standard
R:nls:standard
Download groups
main mirror://CRAN/src/contrib
https://loki.dragonflybsd.org/cranfiles/
Distribution File Information
24571e4642a1a2d9f4f4c7a363b514eece74788d59c09012a5190ee718a91c29 458876 CRAN/tokenizers_0.3.0.tar.gz
Ports that require R-tokenizers:standard
R-tidytext:standard Text mining tool