Network of Languages (using Twitter tweets)

The LINtwitter project aims at determining the networks between different languages using Twitter “tweets”. The key phrases use Languages (nodes) to connect through (edges:) people who tweet in more than one language where Node size reflects number of people and edges are connections between languages. It uses compact language detector to detect the language of the tweet and Python to determine the probability of similarity between any two languages.

My work carried out with determining the symbols/words that were responsible for wrong identification of one language with another. Symbols that influenced the errors were #, the, a, smileys (were recognized as Greek language), $. To overcome this, the probability of detecting a language was increased from sixty two to seventy five where the language detector was able to identify most of the languages correctly. This helped build up more networks between the languages and strengthen the already existing one.