|
|
libcats.org
Главная →
Theory and Algorithms for Information Extraction and Classification in Textual Data MiningTheory and Algorithms for Information Extraction and Classification in Textual Data MiningWu T.Regular expressions can be used as patterns to extract features from semi-structured and narrative text [8]. For example, in police reports a suspect's height might be recorded as "{CD} feet {CD} inches tall", where {CD} is the part of speech tag for a numeric value. The result in [1] shows us that regular expressions could have higher performance than explicit expressions in some applications such as Posting Act Tagging. Although much work has been done in the field of information extraction, relatively little has focused on the automatic discovery of regular expressions. Therefore, my Ph.D. research will focus on the automatic generation of reduced regular expressions (RREs) (defined in [8]) used in Information Extraction (IE).The reduced regular expressions learned can be directly used to extract features from free text, or they can be used to fill in templates in Eric Brill's Transformation-Based Learning (TBL) [2] frameworks. The original templates in TBL are explicit expressions, which are weaker than reduced regular expressions. I propose an innovative enhancement to TBL termed "Error-Driven Boolean-Logic-Rule-Based Learning" (BLogRBL) [9], which is strictly more powerful than TBL [2]. Similar to Brill's method, rules are automatically derived from templates during learning. It differs from Brill's technique in that rules take the form of complex expressions of combinational logic. Therefore, my final contribution in my PhD thesis will be a framework that combines regular expression discovery with BLogRBL.A necessary component of this research is a study of various biases inherent in the use of reduced regular expressions in IE. The purpose of this work is to determine the language biases, search biases, and overfitting biases in the RRE discovery and BLogRBL algorithms.
Скачать книгу бесплатно (pdf, 89 Kb)
Читать «Theory and Algorithms for Information Extraction and Classification in Textual Data Mining» EPUB | FB2 | MOBI | TXT | RTF
* Конвертация файла может нарушить форматирование оригинала. По-возможности скачивайте файл в оригинальном формате.
Популярные книги за неделю:
Проектирование и строительство. Дом, квартира, садАвтор: Петер Нойферт, Автор: Людвиг Нефф
Размер книги: 20.83 Mb
Система упражнений по развитию способностей человека (Практическое пособие)Автор: Петров Аркадий НаумовичКатегория: Путь к себе
Размер книги: 818 Kb
Сотворение мира (3-х томник)Автор: Петров Аркадий НаумовичКатегория: Путь к себе
Размер книги: 817 Kb
Радиолюбительские схемы на ИС типа 555Автор: Трейстер Р.Категория: Электротехника и связь
Размер книги: 13.64 Mb
Genki 1: An Integrated Course in Elementary Japanese 1Автор: Eri Banno, Автор: Yutaka Ohno, Автор: Yoko Sakane, Автор: Chikako Shinagawa, Автор:
Размер книги: 172.22 Mb
Только что пользователи скачали эти книги:
Woman with a Movie Camera: My Life as a Russian FilmmakerАвтор: Marina Goldovskaya, Автор: Antonina W. Bouis, Автор: Robert Rosen
Размер книги: 3.24 Mb
Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and OptimizationАвтор: Rufus IsaacsКатегория: Математика, Прикладная математика
Размер книги: 16.15 Mb
Справочник по строительству портовых гидротехнических сооруженийАвтор: Николаев Г.Н. и др.Категория: КНИГИ НАУКА и УЧЕБА
Размер книги: 14.34 Mb
Athens remade in the age of Augustus: a study of the architects and craftsmen at workАвтор: Jeffrey Christopher Burden
Размер книги: 16.96 Mb
The Man Who Cast Two Shadows (The Man Who Lied to Women)Автор: O'Connell CarolКатегория: fiction
Размер книги: 526 Kb
Kenneth Copeland Collection (65 Books) (Epub & Mobi)Автор: Kenneth Copeland, Автор: Gloria CopelandКатегория: Christian
Размер книги: 55.02 Mb
|
|
|