Skip to content

faithcomesbyhearing/lang_tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lang_tree

AI language tools, such as speech-to-text, text-to-speech, and language-ident have been developed for only a small percentage of the known languages. As a result when doing AI work on a language that none of the tools process, it is essential to find a language closely related, which does have AI tool support.

The lang_tree program uses a glotto log hierarch of languages to find related languages. This tool is able to find related languages for eSpeak, mms-language-ident, mms-speech-to-text, mms-text-to-speech, whisper-speech-to-text, whisper-translation.

Searches are done up and down the tree to find the closest language that is supported by the AI tool of interest. Closeness is defined by a simple count of the number of nodes. This is a naive solution. If someone can suggest a statistic that defines the similarity of a language to its children in the hierarchy, a better algorithm can be added.


Use the lang_tree executable as follows:

Usage: lang_tree [-v] <iso-code> <ai-tool>
optional -v is used for a detailed response
iso-code can be any iso639-3 or iso639-1 code
ai-tool can one of the following: espeak, mms_asr, mms_lid, mms_tts, whisper

To install this package as a go module:

go get github.com/faithcomesbyhearing/lang_tree

To use lang_tree in a go program:

var tree = NewLanguageTree(ctx context.Context)
err := tree.Load()
languages, distance, err := tree.Search(iso639 string, aiTool string)

The AI tools supported are as follows:

  • espeak
  • mms_asr
  • mms_lid
  • mms_tts
  • whisper

The search returns a slice of iso639 codes that are related and are supported by the AI tool. The list can be empty if none are found. The list will contain multiple, if multiple supported languages are on the same level in the hierarchy.

For the user who wishes to dig deeper, there is another search method.

languages, distance, err := tree.DetailSearch(iso639 string, aiTool string)

The DetailSearch, performs the same algorithm, but returns more information. It returns a slice of type db.Language, which is as follows:

type Language struct {
	GlottoId    string      `json:"id"`
	FamilyId    string      `json:"family_id"`
	ParentId    string      `json:"parent_id"`
	Name        string      `json:"name"`
	Bookkeeping bool        `json:"bookkeeping"`
	Level       string      `json:"level"` //(language, dialect, family)
	Iso6393     string      `json:"iso639_3"`
	CountryIds  string      `json:"country_ids"`
	Iso6391     string      `json:"iso639_1"`
	ESpeak      string      `json:"espeak"`
	MMSASR      string      `json:"mms_asr"`
	MMSLID      string      `json:"mms_lid"`
	MMSTTS      string      `json:"mms_tts"`
	Whisper     string      `json:"whisper"`
	Parent      *Language   `json:"-"`
	Children    []*Language `json:"-"`
}

About

This program finds related languages for one or more that supported by an AI tool.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors