International Science Index
Analyzing Environmental Emotive Triggers in Terrorist Propaganda
The purpose of this study is to measure the intersection of environmental security entities in terrorist propaganda. To the best of author’s knowledge, this is the first study of its kind to examine this intersection within terrorist propaganda. Rosoka, natural language processing software and frame analysis are used to advance our understanding of how environmental frames function as emotive triggers. Violent jihadi demagogues use frames to suggest violent and non-violent solutions to their grievances. Emotive triggers are framed in a way to leverage individual and collective attitudes in psychological warfare. A comparative research design is used because of the differences and similarities that exist between two variants of violent jihadi propaganda that target western audiences. Analysis is based on salience and network text analysis, which generates violent jihadi semantic networks. Findings indicate that environmental frames are used as emotive triggers across both data sets, but also as tactical and information data points. A significant finding is that certain core environmental emotive triggers like “water,” “soil,” and “trees” are significantly salient at the aggregate level across both data sets. All environmental entities can be classified into two categories, symbolic and literal. Importantly, this research illustrates how demagogues use environmental emotive triggers in cyber space from a subcultural perspective to mobilize target audiences to their ideology and praxis. Understanding the anatomy of propaganda construction is necessary in order to generate effective counter narratives in information operations. This research advances an additional method to inform practitioners and policy makers of how environmental security and propaganda intersect.
Composite Kernels for Public Emotion Recognition from Twitter
The Internet has grown into a powerful medium for information dispersion and social interaction that leads to a rapid growth of social media which allows users to easily post their emotions and perspectives regarding certain topics online. Our research aims at using natural language processing and text mining techniques to explore the public emotions expressed on Twitter by analyzing the sentiment behind tweets. In this paper, we propose a composite kernel method that integrates tree kernel with the linear kernel to simultaneously exploit both the tree representation and the distributed emotion keyword representation to analyze the syntactic and content information in tweets. The experiment results demonstrate that our method can effectively detect public emotion of tweets while outperforming the other compared methods.
Emotional Analysis for Text Search Queries on Internet
The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.
Q-Map: Clinical Concept Mining from Clinical Documents
Over the past decade, there has been a steep rise in
the data-driven analysis in major areas of medicine, such as clinical
decision support system, survival analysis, patient similarity analysis,
image analytics etc. Most of the data in the field are well-structured
and available in numerical or categorical formats which can be used
for experiments directly. But on the opposite end of the spectrum,
there exists a wide expanse of data that is intractable for direct
analysis owing to its unstructured nature which can be found in the
form of discharge summaries, clinical notes, procedural notes which
are in human written narrative format and neither have any relational
model nor any standard grammatical structure. An important step
in the utilization of these texts for such studies is to transform
and process the data to retrieve structured information from the
haystack of irrelevant data using information retrieval and data mining
techniques. To address this problem, the authors present Q-Map in
this paper, which is a simple yet robust system that can sift through
massive datasets with unregulated formats to retrieve structured
information aggressively and efficiently. It is backed by an effective
mining technique which is based on a string matching algorithm
that is indexed on curated knowledge sources, that is both fast
and configurable. The authors also briefly examine its comparative
performance with MetaMap, one of the most reputed tools for medical
concepts retrieval and present the advantages the former displays over
Methodology for Developing an Intelligent Tutoring System Based on Marzano’s Taxonomy
The Mexican educational system faces diverse challenges related with the quality and coverage of education. The development of Intelligent Tutoring Systems (ITS) may help to solve some of them by helping teachers to customize their classes according to the performance of the students in online courses. In this work, we propose the adaptation of a functional ITS based on Bloom’s taxonomy called Sistema de Apoyo Generalizado para la Enseñanza Individualizada (SAGE), to measure student’s metacognition and their emotional response based on Marzano’s taxonomy. The students and the system will share the control over the advance in the course, so they can improve their metacognitive skills. The system will not allow students to get access to subjects not mastered yet. The interaction between the system and the student will be implemented through Natural Language Processing techniques, thus avoiding the use of sensors to evaluate student’s response. The teacher will evaluate student’s knowledge utilization, which is equivalent to the last cognitive level in Marzano’s taxonomy.
Adaption Model for Building Agile Pronunciation Dictionaries Using Phonemic Distance Measurements
Where human beings can easily learn and adopt pronunciation variations, machines need training before put into use. Also humans keep minimum vocabulary and their pronunciation variations are stored in front-end of their memory for ready reference, while machines keep the entire pronunciation dictionary for ready reference. Supervised methods are used for preparation of pronunciation dictionaries which take large amounts of manual effort, cost, time and are not suitable for real time use. This paper presents an unsupervised adaptation model for building agile and dynamic pronunciation dictionaries online. These methods mimic human approach in learning the new pronunciations in real time. A new algorithm for measuring sound distances called Dynamic Phone Warping is presented and tested. Performance of the system is measured using an adaptation model and the precision metrics is found to be better than 86 percent.
Evaluating 8D Reports Using Text-Mining
Increasing quality requirements make reliable and effective quality management indispensable. This includes the complaint handling in which the 8D method is widely used. The 8D report as a written documentation of the 8D method is one of the key quality documents as it internally secures the quality standards and acts as a communication medium to the customer. In practice, however, the 8D report is mostly faulty and of poor quality. There is no quality control of 8D reports today. This paper describes the use of natural language processing for the automated evaluation of 8D reports. Based on semantic analysis and text-mining algorithms the presented system is able to uncover content and formal quality deficiencies and thus increases the quality of the complaint processing in the long term.
Social Media Idea Ontology: A Concept for Semantic Search of Product Ideas in Customer Knowledge through User-Centered Metrics and Natural Language Processing
In order to survive on the market, companies must
constantly develop improved and new products. These products are
designed to serve the needs of their customers in the best possible
way. The creation of new products is also called innovation and is
primarily driven by a company’s internal research and development
department. However, a new approach has been taking place for some
years now, involving external knowledge in the innovation process.
This approach is called open innovation and identifies customer
knowledge as the most important source in the innovation process. This paper presents a concept of using social media posts as an external source to support the open innovation approach in its
initial phase, the Ideation phase. For this purpose, the social media
posts are semantically structured with the help of an ontology and
the authors are evaluated using graph-theoretical metrics such as
density. For the structuring and evaluation of relevant social media
posts, we also use the findings of Natural Language Processing, e.
g. Named Entity Recognition, specific dictionaries, Triple Tagger
and Part-of-Speech-Tagger. The selection and evaluation of the tools
used are discussed in this paper. Using our ontology and metrics
to structure social media posts enables users to semantically search
these posts for new product ideas and thus gain an improved insight
into the external sources such as customer needs.
Causal Relation Identification Using Convolutional Neural Networks and Knowledge Based Features
Causal relation identification is a crucial task in information extraction and knowledge discovery. In this work, we present two approaches to causal relation identification. The first is a classification model trained on a set of knowledge-based features. The second is a deep learning based approach training a model using convolutional neural networks to classify causal relations. We experiment with several different convolutional neural networks (CNN) models based on previous work on relation extraction as well as our own research. Our models are able to identify both explicit and implicit causal relations as well as the direction of the causal relation. The results of our experiments show a higher accuracy than previously achieved for causal relation identification tasks.
Study of Syntactic Errors for Deep Parsing at Machine Translation
Syntactic parsing is vital for semantic treatment by many applications related to natural language processing (NLP), because form and content coincide in many cases. However, it has not yet reached the levels of reliable performance. By manually examining and analyzing individual machine translation output errors that involve syntax as well as semantics, this study attempts to discover what is required for improving syntactic and semantic parsing.
Part of Speech Tagging Using Statistical Approach for Nepali Text
Part of Speech Tagging has always been a challenging task in the era of Natural Language Processing. This article presents POS tagging for Nepali text using Hidden Markov Model and Viterbi algorithm. From the Nepali text, annotated corpus training and testing data set are randomly separated. Both methods are employed on the data sets. Viterbi algorithm is found to be computationally faster and accurate as compared to HMM. The accuracy of 95.43% is achieved using Viterbi algorithm. Error analysis where the mismatches took place is elaborately discussed.
A Sentence-to-Sentence Relation Network for Recognizing Textual Entailment
Over the past decade, there have been promising developments in Natural Language Processing (NLP) with several investigations of approaches focusing on Recognizing Textual Entailment (RTE). These models include models based on lexical similarities, models based on formal reasoning, and most recently deep neural models. In this paper, we present a sentence encoding model that exploits the sentence-to-sentence relation information for RTE. In terms of sentence modeling, Convolutional neural network (CNN) and recurrent neural networks (RNNs) adopt different approaches. RNNs are known to be well suited for sequence modeling, whilst CNN is suited for the extraction of n-gram features through the filters and can learn ranges of relations via the pooling mechanism. We combine the strength of RNN and CNN as stated above to present a unified model for the RTE task. Our model basically combines relation vectors computed from the phrasal representation of each sentence and final encoded sentence representations. Firstly, we pass each sentence through a convolutional layer to extract a sequence of higher-level phrase representation for each sentence from which the first relation vector is computed. Secondly, the phrasal representation of each sentence from the convolutional layer is fed into a Bidirectional Long Short Term Memory (Bi-LSTM) to obtain the final sentence representations from which a second relation vector is computed. The relations vectors are combined and then used in then used in the same fashion as attention mechanism over the Bi-LSTM outputs to yield the final sentence representations for the classification. Experiment on the Stanford Natural Language Inference (SNLI) corpus suggests that this is a promising technique for RTE.
Context Detection in Spreadsheets Based on Automatically Inferred Table Schema
Programming requires years of training. With natural language and end user development methods, programming could become available to everyone. It enables end users to program their own devices and extend the functionality of the existing system without any knowledge of programming languages. In this paper, we describe an Interactive Spreadsheet Processing Module (ISPM), a natural language interface to spreadsheets that allows users to address ranges within the spreadsheet based on inferred table schema. Using the ISPM, end users are able to search for values in the schema of the table and to address the data in spreadsheets implicitly. Furthermore, it enables them to select and sort the spreadsheet data by using natural language. ISPM uses a machine learning technique to automatically infer areas within a spreadsheet, including different kinds of headers and data ranges. Since ranges can be identified from natural language queries, the end users can query the data using natural language. During the evaluation 12 undergraduate students were asked to perform operations (sum, sort, group and select) using the system and also Excel without ISPM interface, and the time taken for task completion was compared across the two systems. Only for the selection task did users take less time in Excel (since they directly selected the cells using the mouse) than in ISPM, by using natural language for end user software engineering, to overcome the present bottleneck of professional developers.
A Hybrid Multi-Criteria Hotel Recommender System Using Explicit and Implicit Feedbacks
Recommender systems, also known as recommender engines, have become an important research area and are now being applied in various fields. In addition, the techniques behind the recommender systems have been improved over the time. In general, such systems help users to find their required products or services (e.g. books, music) through analyzing and aggregating other users’ activities and behavior, mainly in form of reviews, and making the best recommendations. The recommendations can facilitate user’s decision making process. Despite the wide literature on the topic, using multiple data sources of different types as the input has not been widely studied. Recommender systems can benefit from the high availability of digital data to collect the input data of different types which implicitly or explicitly help the system to improve its accuracy. Moreover, most of the existing research in this area is based on single rating measures in which a single rating is used to link users to items. This paper proposes a highly accurate hotel recommender system, implemented in various layers. Using multi-aspect rating system and benefitting from large-scale data of different types, the recommender system suggests hotels that are personalized and tailored for the given user. The system employs natural language processing and topic modelling techniques to assess the sentiment of the users’ reviews and extract implicit features. The entire recommender engine contains multiple sub-systems, namely users clustering, matrix factorization module, and hybrid recommender system. Each sub-system contributes to the final composite set of recommendations through covering a specific aspect of the problem. The accuracy of the proposed recommender system has been tested intensively where the results confirm the high performance of the system.
An Analysis of Learners’ Reports for Measuring Co-Creational Education
To increase the quality of learning, teacher and learner need mutual effort for realization of educational value. For this purpose, we need to manage the co-creational education among teacher and learners. In this research, we try to find a feature of co-creational education. To be more precise, we analyzed learners’ reports by natural language processing, and extract some features that describe the state of the co-creational education.
Role of Natural Language Processing in Information Retrieval; Challenges and Opportunities
This paper aims to analyze the role of natural
language processing (NLP). The paper will discuss the role in the
context of automated data retrieval, automated question answer, and
text structuring. NLP techniques are gaining wider acceptance in real
life applications and industrial concerns. There are various
complexities involved in processing the text of natural language that
could satisfy the need of decision makers. This paper begins with the
description of the qualities of NLP practices. The paper then focuses
on the challenges in natural language processing. The paper also
discusses major techniques of NLP. The last section describes
opportunities and challenges for future research.
Arabic Word Semantic Similarity
This paper is concerned with the production of an Arabic word semantic similarity benchmark dataset. It is the first of its kind for Arabic which was particularly developed to assess the accuracy of word semantic similarity measurements. Semantic similarity is an essential component to numerous applications in fields such as natural language processing, artificial intelligence, linguistics, and psychology. Most of the reported work has been done for English. To the best of our knowledge, there is no word similarity measure developed specifically for Arabic. In this paper, an Arabic benchmark dataset of 70 word pairs is presented. New methods and best possible available techniques have been used in this study to produce the Arabic dataset. This includes selecting and creating materials, collecting human ratings from a representative sample of participants, and calculating the overall ratings. This dataset will make a substantial contribution to future work in the field of Arabic WSS and hopefully it will be considered as a reference basis from which to evaluate and compare different methodologies in the field.
Aspect Oriented Software Architecture
Natural language processing systems pose a unique
challenge for software architectural design as system complexity has
increased continually and systems cannot be easily constructed from
loosely coupled modules. Lexical, syntactic, semantic, and pragmatic
aspects of linguistic information are tightly coupled in a manner that
requires separation of concerns in a special way in design,
implementation and maintenance. An aspect oriented software
architecture is proposed in this paper after critically reviewing
relevant architectural issues. For the purpose of this paper, the
syntactic aspect is characterized by an augmented context-free
grammar. The semantic aspect is composed of multiple perspectives
including denotational, operational, axiomatic and case frame
approaches. Case frame semantics matured in India from deep
thematic analysis. It is argued that lexical, syntactic, semantic and
pragmatic aspects work together in a mutually dependent way and
their synergy is best represented in the aspect oriented approach. The
software architecture is presented with an augmented Unified
Named Entity Recognition using Support Vector Machine: A Language Independent Approach
Named Entity Recognition (NER) aims to classify each word of a document into predefined target named entity classes and is now-a-days considered to be fundamental for many Natural Language Processing (NLP) tasks such as information retrieval, machine translation, information extraction, question answering systems and others. This paper reports about the development of a NER system for Bengali and Hindi using Support Vector Machine (SVM). Though this state of the art machine learning technique has been widely applied to NER in several well-studied languages, the use of this technique to Indian languages (ILs) is very new. The system makes use of the different contextual information of the words along with the variety of features that are helpful in predicting the four different named (NE) classes, such as Person name, Location name, Organization name and Miscellaneous name. We have used the annotated corpora of 122,467 tokens of Bengali and 502,974 tokens of Hindi tagged with the twelve different NE classes 1, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL) 2. In addition, we have manually annotated 150K wordforms of the Bengali news corpus, developed from the web-archive of a leading Bengali newspaper. We have also developed an unsupervised algorithm in order to generate the lexical context patterns from a part of the unlabeled Bengali news corpus. Lexical patterns have been used as the features of SVM in order to improve the system performance. The NER system has been tested with the gold standard test sets of 35K, and 60K tokens for Bengali, and Hindi, respectively. Evaluation results have demonstrated the recall, precision, and f-score values of 88.61%, 80.12%, and 84.15%, respectively, for Bengali and 80.23%, 74.34%, and 77.17%, respectively, for Hindi. Results show the improvement in the f-score by 5.13% with the use of context patterns. Statistical analysis, ANOVA is also performed to compare the performance of the proposed NER system with that of the existing HMM based system for both the languages.
A Thai to English Machine Translation System Using Thai LFG Tree Structure as Interlingua
Machine Translation (MT) between the Thai and English languages has been a challenging research topic in natural language processing. Most research has been done on English to Thai machine translation, but not the other way around. This paper presents a Thai to English Machine Translation System that translates a Thai sentence into interlingua of a Thai LFG tree using LFG grammar and a bottom up parser. The Thai LFG tree is then transformed into the corresponding English LFG tree by pattern matching and node transformation. Finally, an equivalent English sentence is created using structural information prescribed by the English LFG tree. Based on results of experiments designed to evaluate the performance of the proposed system, it can be stated that the system has been proven to be effective in providing a useful translation from Thai to English.
Humanoid Personalized Avatar Through Multiple Natural Language Processing
There has been a growing interest in implementing humanoid avatars in networked virtual environment. However, most existing avatar communication systems do not take avatars- social backgrounds into consideration. This paper proposes a novel humanoid avatar animation system to represent personalities and facial emotions of avatars based on culture, profession, mood, age, taste, and so forth. We extract semantic keywords from the input text through natural language processing, and then the animations of personalized avatars are retrieved and displayed according to the order of the keywords. Our primary work is focused on giving avatars runtime instruction from multiple natural languages. Experiments with Chinese, Japanese and English input based on the prototype show that interactive avatar animations can be displayed in real time and be made available online. This system provides a more natural and interesting means of human communication, and therefore is expected to be used for cross-cultural communication, multiuser online games, and other entertainment applications.
An Integrated Natural Language Processing Approach for Conversation System
The main aim of this research is to investigate a novel technique for implementing a more natural and intelligent conversation system. Conversation systems are designed to converse like a human as much as their intelligent allows. Sometimes, we can think that they are the embodiment of Turing-s vision. It usually to return a predetermined answer in a predetermined order, but conversations abound with uncertainties of various kinds. This research will focus on an integrated natural language processing approach. This approach includes an integrated knowledge-base construction module, a conversation understanding and generator module, and a state manager module. We discuss effectiveness of this approach based on an experiment.
Sounds Alike Name Matching for Myanmar Language
Personal name matching system is the core of
essential task in national citizen database, text and web mining,
information retrieval, online library system, e-commerce and record
linkage system. It has necessitated to the all embracing research in
the vicinity of name matching. Traditional name matching methods
are suitable for English and other Latin based language. Asian
languages which have no word boundary such as Myanmar language
still requires sounds alike matching system in Unicode based
application. Hence we proposed matching algorithm to get analogous
sounds alike (phonetic) pattern that is convenient for Myanmar
character spelling. According to the nature of Myanmar character, we
consider for word boundary fragmentation, collation of character.
Thus we use pattern conversion algorithm which fabricates words in
pattern with fragmented and collated. We create the Myanmar sounds
alike phonetic group to help in the phonetic matching. The
experimental results show that fragmentation accuracy in 99.32% and
processing time in 1.72 ms.
Performance Analysis of MT Evaluation Measures and Test Suites
Many measures have been proposed for machine
translation evaluation (MTE) while little research has been done on
the performance of MTE methods. This paper is an effort for MTE
performance analysis. A general frame is proposed for the description
of the MTE measure and the test suite, including whether the
automatic measure is consistent with human evaluation, whether
different results from various measures or test suites are consistent,
whether the content of the test suite is suitable for performance
evaluation, the degree of difficulty of the test suite and its influence
on the MTE, the relationship of MTE result significance and the size
of the test suite, etc. For a better clarification of the frame, several
experiment results are analyzed relating human evaluation, BLEU
evaluation, and typological MTE. A visualization method is
introduced for better presentation of the results. The study aims for
aid in construction of test suite and method selection in MTE
Thematic Role Extraction Using Shallow Parsing
Extracting thematic (semantic) roles is one of the
major steps in representing text meaning. It refers to finding the
semantic relations between a predicate and syntactic constituents in a
sentence. In this paper we present a rule-based approach to extract
semantic roles from Persian sentences. The system exploits a twophase
architecture to (1) identify the arguments and (2) label them
for each predicate.
For the first phase we developed a rule based shallow parser to
chunk Persian sentences and for the second phase we developed a
knowledge-based system to assign 16 selected thematic roles to the
chunks. The experimental results of testing each phase are shown at
the end of the paper.
Distributional Semantics Approach to Thai Word Sense Disambiguation
Word sense disambiguation is one of the most important open problems in natural language processing applications such as information retrieval and machine translation. Many approach strategies can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledgebased, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy that employs an unsupervised learning method for disambiguation. We report our investigation of Latent Semantic Indexing (LSI), an information retrieval technique and unsupervised learning, to the task of Thai noun and verbal word sense disambiguation. The Latent Semantic Indexing has been shown to be efficient and effective for Information Retrieval. For the purposes of this research, we report experiments on two Thai polysemous words, namely /hua4/ and /kep1/ that are used as a representative of Thai nouns and verbs respectively. The results of these experiments demonstrate the effectiveness and indicate the potential of applying vector-based distributional information measures to semantic disambiguation.
Structural Parsing of Natural Language Text in Tamil Using Phrase Structure Hybrid Language Model
Parsing is important in Linguistics and Natural
Language Processing to understand the syntax and semantics of a
natural language grammar. Parsing natural language text is
challenging because of the problems like ambiguity and inefficiency.
Also the interpretation of natural language text depends on context
based techniques. A probabilistic component is essential to resolve
ambiguity in both syntax and semantics thereby increasing accuracy
and efficiency of the parser. Tamil language has some inherent
features which are more challenging. In order to obtain the solutions,
lexicalized and statistical approach is to be applied in the parsing
with the aid of a language model. Statistical models mainly focus on
semantics of the language which are suitable for large vocabulary
tasks where as structural methods focus on syntax which models
small vocabulary tasks. A statistical language model based on Trigram
for Tamil language with medium vocabulary of 5000 words has
been built. Though statistical parsing gives better performance
through tri-gram probabilities and large vocabulary size, it has some
disadvantages like focus on semantics rather than syntax, lack of
support in free ordering of words and long term relationship. To
overcome the disadvantages a structural component is to be
incorporated in statistical language models which leads to the
implementation of hybrid language models. This paper has attempted
to build phrase structured hybrid language model which resolves
above mentioned disadvantages. In the development of hybrid
language model, new part of speech tag set for Tamil language has
been developed with more than 500 tags which have the wider
coverage. A phrase structured Treebank has been developed with 326
Tamil sentences which covers more than 5000 words. A hybrid
language model has been trained with the phrase structured Treebank
using immediate head parsing technique. Lexicalized and statistical
parser which employs this hybrid language model and immediate
head parsing technique gives better results than pure grammar and
trigram based model.
ORank: An Ontology Based System for Ranking Documents
Increasing growth of information volume in the
internet causes an increasing need to develop new (semi)automatic
methods for retrieval of documents and ranking them according to
their relevance to the user query. In this paper, after a brief review
on ranking models, a new ontology based approach for ranking
HTML documents is proposed and evaluated in various
circumstances. Our approach is a combination of conceptual,
statistical and linguistic methods. This combination reserves the
precision of ranking without loosing the speed. Our approach
exploits natural language processing techniques for extracting
phrases and stemming words. Then an ontology based conceptual
method will be used to annotate documents and expand the query.
To expand a query the spread activation algorithm is improved so
that the expansion can be done in various aspects. The annotated
documents and the expanded query will be processed to compute
the relevance degree exploiting statistical methods. The outstanding
features of our approach are (1) combining conceptual, statistical
and linguistic features of documents, (2) expanding the query with
its related concepts before comparing to documents, (3) extracting
and using both words and phrases to compute relevance degree, (4)
improving the spread activation algorithm to do the expansion based
on weighted combination of different conceptual relationships and
(5) allowing variable document vector dimensions. A ranking
system called ORank is developed to implement and test the
proposed model. The test results will be included at the end of the
Semi-Automatic Analyzer to Detect Authorial Intentions in Scientific Documents
Information Retrieval has the objective of studying
models and the realization of systems allowing a user to find the
relevant documents adapted to his need of information. The
information search is a problem which remains difficult because the
difficulty in the representing and to treat the natural languages such
as polysemia. Intentional Structures promise to be a new paradigm to
extend the existing documents structures and to enhance the different
phases of documents process such as creation, editing, search and
retrieval. The intention recognition of the author-s of texts can reduce
the largeness of this problem. In this article, we present intentions
recognition system is based on a semi-automatic method of
extraction the intentional information starting from a corpus of text.
This system is also able to update the ontology of intentions for the
enrichment of the knowledge base containing all possible intentions
of a domain. This approach uses the construction of a semi-formal
ontology which considered as the conceptualization of the intentional
information contained in a text. An experiments on scientific
publications in the field of computer science was considered to
validate this approach.
SMaTTS: Standard Malay Text to Speech System
This paper presents a rule-based text- to- speech
(TTS) Synthesis System for Standard Malay, namely SMaTTS. The
proposed system using sinusoidal method and some pre- recorded
wave files in generating speech for the system. The use of phone
database significantly decreases the amount of computer memory
space used, thus making the system very light and embeddable. The
overall system was comprised of two phases the Natural Language
Processing (NLP) that consisted of the high-level processing of text
analysis, phonetic analysis, text normalization and morphophonemic
module. The module was designed specially for SM to overcome
few problems in defining the rules for SM orthography system before
it can be passed to the DSP module. The second phase is the Digital
Signal Processing (DSP) which operated on the low-level process of
the speech waveform generation. A developed an intelligible and
adequately natural sounding formant-based speech synthesis system
with a light and user-friendly Graphical User Interface (GUI) is
introduced. A Standard Malay Language (SM) phoneme set and an
inclusive set of phone database have been constructed carefully for
this phone-based speech synthesizer. By applying the generative
phonology, a comprehensive letter-to-sound (LTS) rules and a
pronunciation lexicon have been invented for SMaTTS. As for the
evaluation tests, a set of Diagnostic Rhyme Test (DRT) word list was
compiled and several experiments have been performed to evaluate
the quality of the synthesized speech by analyzing the Mean Opinion
Score (MOS) obtained. The overall performance of the system as
well as the room for improvements was thoroughly discussed.