-
Notifications
You must be signed in to change notification settings - Fork 0
DocumentUploading
Alex Rudnick edited this page Jul 20, 2013
·
1 revision
- how to put a person in the loop? do we need to get a person to approve gn documents on upload? probably.
- quality of the gn text? typo checking? language identification to make sure it's actually gn?
- what's appropriate content?
- what kinds of content do we actually want to host?
- are there any legal implications here?
- how to upload bilingual documents and keep both haves together?
- how to identify which sentences are translations of each other? ...
- spam filtering?
- "does this even look like gn?"
- can we do some automatic checks to look for bad spacing, typos, etc?
So maybe there's a semiautomatic process that tries to filter out documents in the wrong language and terrible gn first, before presenting it to the human.
(who is the human? who would volunteer to do this?)
Mike wants to have like any and all valid gn text in the corpus. Somebody else might want to filter out, eg, advertising, religious/political propaganda...