Orthographic Analysis of Anagram Detection Measures

Slide Note
Embed
Share

The research focuses on analyzing anagram detection through orthographic methods. It explores the art of anagram creation and the importance of orthographic analysis in verifying anagrams. Motivations for the study include the need for standardized cognitive monitoring and the educational benefits of anagram-solving tasks. The aim is to develop enhanced anagram detection techniques using similarity measures, comparing and improving existing methods.


Uploaded on Sep 23, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Orthographic Analysis of Anagram through Anagram Detection Measures 1.RAJI RAJI- -LAWAL LAWAL HANAT Y. ,2.AKINWALE AKINWALE ADIO T. AND 3FOLORUNSHO FOLORUNSHO O. 1.DEPARTMENT OF COMPUTER SCIENCE, LAGOS STATE UNIVERSITY ,2.,3..DEPARTMENT OF COMPUTER SCIENCE, FEDERAL UNIVERSITY OF AGRICULTURE ABEOKUTA, DEPARTMENT OF COMPUTER SCIENCE

  2. Outline of Presentation Introduction Aim and Objective Literature Review Methodology: Existing Methods Draw Back of the Existing Methods Improved Methods Result Result Discussion Conclusion References

  3. Introduction The rearrangement of the characters of a word to form another word of the same length n- gram but different position is called anagram. Several new words can be formed from a single word through word permutation. Orthography is the art of writing words with the proper letters, according to accepted usage and correct spelling. Orthographic analysis is important for word verification in anagram detection, this is to enhance easy filtration of invalid generated anagrams.

  4. Introduction Motivation: It is necessary to devise a standard means of monitoring recovery rate and level of patients with cognition problem like aphasia disease. It is necessary to improvise for a cognitive career counsellor. This is due to the fact that, cognitive test reveals working memory capacity, and can in turn be used to predict student s ability to withstand some course of study. Several applications had swept students attention from reading, which is a defect factor of student s reading fluency. Anagram solving task is fun fill and also educative in terms of word at sight and reading fluency for students.

  5. Aim and Objective Aim The aim of this research is to develop improved anagram detection techniques using similarity measures. Objectives Review related works to anagram detection techniques Study the detection techniques to be able to get their draw backs Design an anagram detection technique to improve existing ones Compare the performance of the existing and improved methods

  6. Literature Review The canonical well known model of word recognition, stems from Interactive Activation Model of Word and Letter Perception. It uses slot-filler representations for letters, and are thus poorly suited to capture letter permutation effects. Thus, the model does not seem well-suited for modelling normal reading. Henin, J. et al (2009) proposed a solution that takes up a prominent feature of the early interactive activation models: units corresponding to structures at multiple spatial (and temporal) scales (e.g. phoneme, bigram, word). The model, NGRAMSWELL, was proposed by this researcher. Henin, J. etal (2009)

  7. Literature Review Anagram tasks are frequently used in behavioural research to investigate a wide array of cognitive phenomena. Most prominently, they are used to study the cognitive stages involved in problem solving, specifically insight. Robert D. Vincent(2006). Menalaos E.S and Chris T.P(2013) affirmed that anagram solution tasks have been frequently used to assess word recognition processes. Also anagram solution ability is closely related to reading. Reading is an innate skill, it is one of the most crucial cognitive skill that support school based learning. This is because it is a lengthy process that requires mastering a large set of strategies. Some cognitive tasks like reading, spelling, making lexical decision, solving anagram task e.t.c. requires knowledge of orthographic structure of English. Examples of orthographic structure are syllabification, orthographic neighbourhood size, and bigram frequency.(Norvick L.R and Sherman S.J, 2004)

  8. Literature Review Sandra M. et al (2009) researched on a new measure, the Northwestern Anagram Test (NAT) NAT was developed to test: accuracy of word order (syntax) in sentence production in patients with speech production problem word comprehension word finding difficulties reduced working memory capacity. The anagram method adopted by the researcher required the assembly of individual word cards presented in scrambled order into meaningful sentences. It was hypothesized that the NAT would correlate with measures of syntax but not with measures of naming, word comprehension, or motor speech production. It was predicted that performance would be influenced by grammatical complexity.

  9. Literature Review Robert D.V. et al (2006), worked on anagram software for cognitive research which provided different modes of interactive and automatic operations. The interactive mode allowed a user to provide a hard generated list of test strings or words for evaluation of possible anagram. The list can be entered either interactively or through an input file. The automatic mode pseudo randomly generated a specific number of fixed length test string. In either case, the test string was converted to a sorted order where each possible sub-string was considered. All possible anagrams were identified and the lemma frequency information for all orthographically identical word forms was summed and printed. The research work did not consider bi-gram frequency in anagrams.

  10. Literature Review Menalaos E. S. and Chris T. P. (2013) researched on: the effects of syllabic structure and grapheme frequency of target words on an anagram solving task. They also checked for presumable differences on the solving performance of average and below average readers. This was achieved by measuring time spent and user s number of moves for solving different non- sense anagrams. Data analysis revealed that anagram solving time is affected by the syllabic structure of target words. The main effect of syllabic complexity indicates that both groups of average and below average readers were equally affected. The effect of syllabic complexity was also noticed in the reading fluency measure, where the above average group outperformed the below average group.

  11. Literature Review(Table 1: Previous Approaches to Orthographic Analysis of Anagram Task ) S/NO. 1 Author Menalaos E.S and Chris T.P(2013) Method Syllabic structure and grapheme frequency of target anagram solvingtask Pro It unveiled how: -Syllabic structure affects anagram solvingtime - The effect of Syllabic complexity on reading fluency measure Cons The correlation between the recognition containing infrequent grapheme and those with phoneme-to-grapheme mapping is not explored words in of anagram frequent 2 Norvick Shennem S.J(2004) L.R and Position-Sensitive gram frequency type-based Bi- It better predicts the difficulty of anagram solution compared to token based numbers It is restrictedto five letters word

  12. Literature Review S/NO. 3 Author Sergio al(2014) Method Applied rasch scaling to syllabic structure Pro It ability to solve anagram and the relative difficulty of anagram Cons Syllable restricted letters. p. et establish individual s number to is five 4 Hua-Zhan Yin et al(2016) Syllabic method structure It depicts unconscious and conscious error detection in anagram solution task Orthography restricted to Chinese language. is 5 Couriear P. and Lequexx M(2004) Orthographic neighbourhoodsize Established between prime and target stimuli relationship lexical Research conducted analysis standard software was implemented was using units, tool, no

  13. Literature Review 6 Mary A.F et al (1981) -Syllabic structure neighbourhood and -Syllabic structure was used to compare the effect of copying words on memory. -Neighbourhood frequency was used to compare memory for anagram Research using standard implemented. was conducted tool, Method Orthographic frequency analysis no S/NO. Author Pro Cons software was 7 Sandra M et al (2009) Exploration of syllabic structure It is used to evaluate the cognitive Anagram test was conducted orally ability of patients with primary progressive Aphasia It enable direct psycholinguistic features that may influence the cognitive process in anagram solution 8 Robert, D. et al(2006) Syllabic structure using sorting anagram detection technique control of -Bigram frequency calculation was not directly incorporated in the software Orthographic neighbourhood frequency was not explored Bubble sort makes execution using N-Gramslower 9 Henin J. et al (2009) The Orthographic structure used is NGRAM WELL, with bubble sort as a measure of anagram detection -It serves as better predictor of anagram detection. This is done by using bubble sort to find the distance of transforming supplied anagram answer to target word. -It is position free a

  14. Methodology Existing Anagram Detection Methods: Brute Force: This has to do with listing all permutations of the first string, and check if the second string is equal to any of the permutations of the first. This gives a very poor complexity n! Sorting: two strings are anagrams of each other if they are equal, when their letters are sorted. Counting: It has to do with counting how many times each character occur in each string, and confirm that each string has the same number of character as the other. Histogram: This works by building frequency histogram of characters in each string, and check whether each histogram are the same.

  15. Methodology Brute Force is not considered in this research because it has a poor running time, it runs with as much as n! The methods that were explored are: Sorting Counting Histogram is not considered because it execution process is similar to counting

  16. Anagram Detection with Sorting 1 , S S S S x y i j Where Si,Sj are characters of sorted list of strings X and Y. |Sx|,|Sy| represents length of sorted list X and Y it finds out the presence of a character in both strings, and add 1 to sum for each presence. The sum of the intersection must be equal to the length of X as well as Y for anagram to hold. S S i j

  17. Anagram Detection with Counting , n m 1 C C i ji = = 0, 0 j C represents the number of count of each character i.e. Frequency If the strings are anagram the result will be one, because the number of counts of all characters will be zero.

  18. Draw Back of Existing Methods The existing techniques analyse anagram in words by considering the occurrence of characters without taking cognisance of position of characters, it doesn t do autographic analysis. For example to analyse ALLERGY,ALLERGY anagrammatically. The existing methods tests for the occurrence of all the characters in the first word in the second, without considering their positions.

  19. New Methods for Anagram detection using pattern matching technique: Anagram detection Similarity Measure (ADSM) 1 x y x x y y x y * * 2 p j i 0 1 x y 0 The above is used to keep track of the length of the two strings. x x y y The above is used to monitor the occurrence of various characters in the strings. x y 2 P j i The above is a property function used to monitor the position of characters in the strings.

  20. ADSM ARCHITECTURE FIGURE 1: ADSM ARCHITECTURE: Using syllabic similarity between target word and supplied word

  21. Bi-Anagram Similarity Measure(BASM) 2 gram A gram A B B 1 max2 , BASM checks for: - Character verification - Character position check - Orthographic Analysis - Syllabic structure

  22. Bi-Anagram Similarity Measure(BASM) BASM checks the level of permutation of characters in strings, by finding the number of common bi-gram in the strings. This is to categorize anagrams into: Highly Permuted Anagram i.e. strong Anagram: this occurs if bi-anagram intersection between two strings of equal length but different character position is zero, hard anagrams are categorized here. Moderately Permuted Anagram i.e Averagely Anagram: This occurs if bi-anagram intersection between two strings of equal length but different character position is between 1 and n/2. Moderately hard anagram are categorized here.

  23. Bi-Anagram Similarity Measure(BASM) Weakly Permuted Anagram i.e Weak Anagram: This occurs if bi-anagram intersection between two strings of equal length but different character position is between n/2 and n. Simple anagrams are categorized here. This detection technique ranges from 0-1, the higher the permutation, the higher the anagram detection value. It will test for permutation in strings only if ADSM is 1, i.e. there must be evidence of true existence of anagram, before permutation detection can be carried out.

  24. Given a string : ALLERGY With corresponding anagram: GALLERY, LARGELY, REGALLY Bi-gram representation of anagrams is calculated as follows: AL LL LE ER RG GY GA AL LL LE ER RY 4 6 = (1 0) * (1 ) 0.333 The result of GALLERY to ALLERGY indicates weakly permuted anagram of the value of 0.333. The bi-anagram of ALLERGY and REGALLY is computed as follows: AL LL LE ER RG GY RE EG GA AL LL LY 2 6 = (1 0) * (1 ) 0.67 The result of REGALLY and ALLERGY indicates moderately permuted anagram of the value of 0.67 The bi-anagram of ALLERGY and LARGELY is calculated as follows: AL LL LE ER RG GY LA AR RG GE EL LY 1 6 = (1 0) *(1 ) 0.833 The result of LARGELY and REGALLY indicates strongly permuted anagram of the value of 0.833

  25. BASM Architecture FIGURE 2 : BASM Architecture: Using bi-anagram frequency between target string and supplied string

  26. Anagram Methods Analysis Table S/NO. TEST WORD/ USER S ANSWER ADTS ADTC ADSM BASM Auctioned/Cautioned 1 1 1 0.25 1 2 Auctioned/educatio 0 0 0 0.0 3 Allergy/Regally 1 1 1 0.67

  27. 4 Allergy/Galllery 0 0 0 0.0 5 Antler/Rental 1 1 1 0.8 6 Antler/Antler 1 1 0 0.0 7 Ales/Leas 1 1 1 0.67

  28. 8 Ales/Laes 1 1 0 0.00 9 Assert/Asters 1 1 1 0.6 10 Asserts/Strong 0 0 0 0.0

  29. RESULT Anagram Methods with Averaged Analysis Table S/NO. TEST WORD/ USER S ANSWER ADTS ADTC ADSM BASM 1 Auctioned/Cautioned 1/2.3*10^-5 1/6.5*10^-4 1/0.7*10^-4 0.25/0.6*10^-4 2 Auctioned/educatio 0/2.3*10^-5 0/6.5*10^-4 0/0.7*10^-4 0.0/0.6*10^-4 3 Allergy/Regally 1/1.9*10^-5 1/0.7*10^-4 1/0.4*10^-4 0.67/2.3*10^-4 4 Allergy/Galllery 0/1.9*10^-5 0/0.7*10^-4 0/0.4*10^-4 0.0/2.3*10^-4 5 Antler/Rental 1/2.3*10^-5 1/7.5*10^-5 1/6.5*10^-5 0.8/0.6*10^-4

  30. 6 Antler/Antler 1/2.3*10^-5 1/7.5*10^-5 0/6.5*10^-5 0.0/0.6*10^-4 7 Ales/Leas 1/1.5*10^-5 1/3.3*10^-5 1/3.4*10^-5 0.67/0.6*10^-4 8 Ales/Laes 1/1.5*10^-5 1/3.3*10^-5 0/3.4*10^-5 0.67/0.6*10^-4 9 Assert/Asters 1/1.8*10^-5 1/4.4*10^-5 1/4.6*10^-5 0.6/4.9*10^-5 10 Asserts/Strong 0/1.8*10^-5 0/4.4*10^-5 0/4.6*10^-5 0.0/4.9*10^-5 Total 0.000178 0.001744 0.00051 0.000918 Average 0.0000178 0.0001744 0.000051 0.0000918

  31. Result Discussion The Table shows the anagram status and processing time of each text and pattern. The processing time of the dataset for each method was averaged in the Table, which indicates: that ADTS is the fastest ADSM is faster than BASM ADTC is the slowest. Thus, only ADTS out perform the new methods in terms of execution speed, but this is justified by the ability of ADSM to detect anagram with cognisance of position of character. BASM is considerable because in addition to taking cognisance of position, it also find the distance between anagrams.

  32. Conclusion Previous approaches to orthographic analysis of anagram task were based on: syllabification orthographic neighbourhood size bi-gram frequency. The two improved measures in this research are: ADSM : ADSM explored the use of syllabic structure and syllabic similarity for the measure of anagram relationship between two strings. This is to measure the working memory capacity of individuals in terms of anagram solving ability. It does this by testing for character entailment verification with different position using fuzzification analysis which results to either true or false. BASM. BASM is an improved ADSM in the sense that apart from verifying the character entailment with position using bi-anagram, it further measures the strength of an anagram and classify them as simple, averagely hard or hard. This is to classify individual s cognitive ability as weak, average or strong. Thus, BASM can evaluate the level of working memory capacity in individuals more accurately than others.

  33. References References Adams J.W, Stone M., Vincent R.D, Muncer S.J (2011), The role of syllables in anagram solution: A Rasch analysis, The journal of general psychology. 138(2):94-109 Anagram detection techniques, web.stanford.edu/class/cs9/lectures/04/Anagram.pdf[cited:02/0217] Cornelissen P.L. Hansen P.C, Gilchrist I.D, Cormark F., Essex J., Frankish C. (1997), Coherent motion detection and letter position encoding, vision research, Vol.33 6(4), 2181-2191. Courriea P. and Lequex(2004), anagram effect in visual word recognition, https://archives- onverts.fr/hat0042984[cited:01/01/17] Fink E.T, Weisberg W.R(1981), The use of phonemic information to solve anagrams, memory and cognition. 9(4):404-410 Gilhooly & Johnson(1978), Effect of solution word attributes on anagram difficulty, A regression analysis quarterly journal of experimental psychology. 30: 57-70. Grimes and Mozer(2001), The interplay of symbolic and sub-symbolic processes in anagram problem solving. Advances in neural information processing system 13:17-23. Cambridge, MA:MIT Press

  34. References Henin J., Accorsi, E., Cho, P., Tahori, N., (2009), Extraordinary Natural Ability: Anagram solution as an extension of normal reading ability. Annual meeting of Cognitive science society Proceedings. 31st annual meeting of the cognitive science society proceedings, Mahuah, New Jersey: Lawrence Erlbaum Associates. Hua-Zhan Yin, Junyi Yang, Wei Li, Jiang Qui, Ling Yu Chen (2016),Neural bases of unconscious error detection in a Chinese anagram solution task: Evidence from ERP study, Journal Pone 0154379. Mary A F, Hugh J.F, Alice W., Leslie R. (1989), Anagram solving: Does effort have an effect, memory and cognition. 17(6): 755-758 Menelaos E.S and Chris T.P (2013), Linguistic effect on anagram solution: The case of transparent language, world class journal of education. 3(4): 41-51 Norvick L.R and Sharman S.J(2004), Type based bigram frequencies for five letter words. Behaviour Research Methods, Instruments and Computers, 36(3), 397-401. http://dx.doi.org/10.3758/BF03195587. Robert D.V, Yael K.G, Debra A.T (2006), Anagram software for cognitive research that enables specification of Pscholinguistic variables, Behaviour research methods, 38(2) Sandra W., M.-Marse M., Christina W, Alfred R, Emily J.R, Cynthia K.T.(2009), The Northerwestern Test: Measuring sentence production in primary progressive Aphasia, American Journal of Alzhheinder s disease and other Dementias Sergio P., Augustin O., Juan E.T., Pedro P.(2014), Randomized anagram revisited, Cosec Laboratory, Elsevier

  35. THANK YOU FOR LISTENING

Related


More Related Content