Classification of characters using modular techniques
Bachelor thesis
Permanent lenke
https://hdl.handle.net/11250/2778071Utgivelsesdato
2021Metadata
Vis full innførselSamlinger
Sammendrag
Gjennom dette prosjektet har vi valgt å se på modulær arkitektur som en model for å klassifisere kanji og hiragana som blir brukt i japansk skriftspråk. Japan bruker tre forskjellige tegn i skriftspråket sitt som har forskjellige oppgaver. Der hiragana og katakana bruker 71 tegn hver som representerer lyder, så representerer kanji ideer. På grunn av dette, er det flere tusen forskjellige kanji tegn som blir brukt til daglig. For å løse dette, valte vi å bruke en splitt og hersk strategi, ved å lage flere ekspert modeller som er spesialisert på mindre områder. Mixture of Experts og Negative Correlation Learning er hoved teknikkene som brukes i dette prosjektet. During this project, we decided to have a look at modular architecture for training a model for classifying kanji and hiragana that are being used in japanese writing. The japanese generally use multiple types of character sets simultaneously, where they use hiragana, katakana and kanji for different uses. Katakana and hiragana use about 71 different characters each that represent sounds that often contain a consonant followed by a vocal. Some examples are の (hiragana) and ノ (katakana) that represents the sounds “no”.On the other hand, a kanji represents ideas rather than sounds. Because they represent ideas or things, there has to be a lot of characters that have to be used. JIS recommends 6000 characters for everyday use in Japan, and has separated them into level 1 and level 2. We will be training for JIS level 1 to get most of the important characters. When looking at CNN with a linear architecture, we see that others have managed to get up to 99.5% accuracy with 878 characters (Tsai,2016). As we will come to know, when we tested the same model on 3036 characters, we only reached 40% accuracy. Because we only reached a marginal accuracy, we decided to try a divide and conquer tactic to use multiple expert models that are specialized in their own problem space. The first method we tried is Mixture of Experts (Jacobs et al, 1991) where we train 12 experts, where we use a gating model that distributes importance for each expert for each instance. This will create models that are specialized on separate parts of the problem space. The second method that we try is using Negative Correlation Learning (Liu & Yao, 1999). This method creates unique experts by calculating the correlation between the activation of the models, and rewarding negative correlations.