Aseryla

Model Description: Language

Processing a Sentence

Although the Stanford CoreNLP can deal with several sentences, the system will split the text into simple sentences.
Also will adapt the text for obtain results as the "semantic analyzer" expects.
Take in mind the CoreNLP is quite sensible to syntactic errors.

Actions applied to the input text:

The system considered a phrase when a dot or end of phrase is found, and the white space is the word separator
consecutives new lines it will be interpreted as new sentence
multiple whitespaces are remove (e.g. "the    nice cat" → "the nice cat")
it removes every change of line and tabulation
symbols : ; .. ? will be interpreted as end of sentence
punctuation -- ( ) { } [ ] will be treated as sentence separator → (comma)
" ! will be removed
non- replace by not (non-retractile → not retractile)
/ replaced by or (he/she is nice → he or she is nice)
& substituted for and
check if it's a valid number (only positive integer numbers)
transforms every contraction as I'm for I am, mustn't for must not and so on.
Stanford NLP Core deals correctly with contractions. But for the system is better to avoid them, for not introduced them into the dictionary (which word is 'm ?)
If the word is not valid, it will be discarded from the sentence. A valid word is when every character is a valid English letter (neither numbers nor punctuation nor symbols). But the word can finish in point or comma; The first character could be uppercase, the rest must be lowercase and also in the middle of the word could exist the symbol - (e.g. warm-blooded)
THE SAXON GENITIVE: When a genitive is used in a sentence, the system assumes the main subject is the "belonged object" not the person. Therefore, for the system the sentence "John's car is red" is the same as "car is red".
The word "it" is discarded because is an impersonal pronoun that can't be substituted by the concept "person"
"it" has an impersonal usage (it rains) or as rhetorical device to avoid repeat again the subject in the sentence ("Look that dog!, it looks ill", instead of "Look that dog!, that dog looks ill")
In neither case, "it" can be interpreted as a concept to associate relations.
The possessive pronouns (me, yours,...)are discarded because refers to the context of the subject and does not affect their description.
"the cat scratched my head", it doesn't matter if is my head or your head, for the system the important is that "heads can be scratched".
Therefore in a semantic context "my money is our gain" is the same than "money is gain".
Only positive integers are allowed: "a size between -2.344 and 4.9 inches" → "a size between 2 and 5 inches" (rounded).
To avoid overflow the Stanford Core NLP, the sentences will be limited up to 30 words by sentences. This is a safety measure to avoid processing large text files with no dots.

E.g., the sentence "All t4r sales are non-refundable! ; Don't close the door (neither the window)".

  Sentence1= All sales are not refundable
  Sentence2= Do not close the door, neither the window

In essence, this subprocess consist on getting the results obtained from the Standford NLP core, and convert the branches of their constituent tree into internal language, that represents the grammatical relations which populates the memory structures (learning).

Basically, the idea is apply the following rules to every sentence provided in the system:

From the part of speech and the constituent tree obtained from Stanford NLP core

Example:
(S
(NP (PRP We))
(VP (VBD fed)
(NP (DT the) (JJ nice) (NN cat)))
)

Remind constituent trees are tagged with the Penn Treebank

Found the concepts, that are the main nouns identified into the Noun Phrases
(NP (DT the) (JJ nice) (NN cat)) → cat

Associate the identified characteristics with the concepts; characteristics are the semantic words that are preceding the main noun in a noun phrase. As:

Adjectives = features
(NP (DT the) (JJ nice) (NN cat)) → "cat IS nice"
Verbs = skills
(NP (DT the) (VBG jumping) (NN kangaroo))) → "kangaroo CAN jump"
Or even nouns (noun as adjective)
(NP (DT the) (NN animal) (NN man)))
this case ambiguous, "animal" is a parent or a feature of "man"?
it can't be resolved with only this information, therefore It's stored in the frame memory as an special case ("man ADJNOUN animal" )

On the other hand getting the main verb (VP) and the subjects (NP) and objects (NPs in VP), identifying properties and relations, such as:

Parent relations: when the verb is "be" (objects are nouns): cat is an animal → "cat ISA animal"
Feature relations: main verb is "be" but objects are adjectives: cats are nice → "cat IS nice"
Attributes: when the verb is "have": cats and dogs has legs or limbs → "cat HAVE leg" / "cat HAVE limb" / "dog HAVE leg" / "dog HAVE limb"
Skills: the main verb is an action verb (not modal, not auxiliary) : the cat has been running → "cat CAN run"
Affected Actions: (same case as above but for the objects) the cat ate the mouse → "mouse CANBE eat"
                            or directly with the agrupation "can + be": the pen could was thrown through the window "pen CANBE throw"

When the sentences are negative (has the negative conjunction NOT), the relations are the same except for his tendency (in the memory are stored in a format that clarifies that this is a negative relation when is asked)

Parent relation: cats are not robots → "cat ISA robot -1"
Feature relation: cat is not ugly → "cat IS ugly -1"
Attribute relation: cats doesn't have wings → "cat HAVE wing -1"
But there is an exception if the object has modifiers, in this scenario the negative notation affects the attribute not the relation:
For example taking the sentence "a stool doesn't have large backrest"
the fact of stools doesn't have large backrest does not implies stools can't have backrest
→ "stool HAVE backrest +1 / stool%backrest IS large -1 "
Skill relation:
cats can't fly → "cat CAN fly -1"
cats didn't eat the mouse
that currently does not do the action, does not mean the subject can never do it
so only if the specified verb is can then the action verb is accepted as negative skill
Therefore in this special case → "cat CAN eat" (positive)
Affected Action relation:
Same scenario as above, only when the verb can is specified the action verb will be accepted as negative affected action
"that dog did not barked at the cat" → "cat CANBE bark" (positive)
"the dog can not bark at the cat" →"cat CANBE bark -1"
The verb 'can' is a modal, and the negative proposition ('not') affects the action verb: "a cat can't be a dog" →"cat ISA dog -1"

In case more than one action verb is identified, the last one is considered the link verb, and the others, subject relations.
For example: dogs and cats had been looking and running in the terrace    →    [SUBJECT: dog, cat / VERBS: look, run / OBJECTS: terrace]
     "dog CAN look" / "cat CAN look"
     "dog CAN run" / "cat CAN run"
     "terrace CANBE run" / "terrace CANBE look" (as 'look' is not the link verb, objects are not affected by them)

Lastly, there are some exceptions when in the NPs contains modifiers, that affects the rules described in the point 2

In sentences with "be" as main verb (ISA / parent relation):
E,g. "Cat is a small mammal and a good pet"
small and good will be assign to cat, instead of small to mammal and good to pet respectively
In sentences with "have" as main verb (attribute relation):
"cats has retractile claws"
retractile is referring to claws of cat, nor to a cat neither to claws in general

In both cases exceptions, if a negative conjunction appears then the negation affects to the modifier not to the object
"cat is not a big animal" → cat ISA animal / cat IS big -1
"cat has not big claws" → cat HAVE claw / cat%claw IS big -1 (the symbol % means "reference to the attribute of")

Take in mind, the parts of the sentence far of the identified objects will be processed applying the point 2.
The cat is nice and good but never eats good food.
Processed by point 3        Processed by point 2.

Whilst, if in the subjects of the NP part has Prepositions Phrases (PP) then also extract characteristics from them.

with: usually represents attribute relations
a car with big wheels has stopped in the pavement →"car have wheel" + "wheel of car are big"
without: but the negative is only for the modifiers, not for the property
"a man without big hands" → the man have hands but NOT big hands.
"a man without hands" → assuming that in this case the man is an instance, therefore this man don't have hands, but men in general have hands.
of: with no more information the OF relation can't be fitted into the attributes list or the feature list.
"the table of good wood is expensive" Table is made of wood (feature) or the table has wood (attribute)
When you include other knowledge you can conclude that this sentence is referring to a table made of wood. But not when you only dispose of grammatical information.
As this scenarios is ambiguous are stored in the frame as an special case → table OFCALUSE wood

Finally, the process also manages relatives clauses or "WH" (which, who, that) subphrases
This kind of sentences will be processed separately in parts, the main sentence plus the WH phrase replacing the relative clause for each identified subject
"cats and dogs, that are not robots, are mammals"→"cats and dogs are mammals" + "cats are not robots" + "dogs are not robots"

Some graphical examples of grammatical analyzing and their relation with their correspondent internal language code

By design, the parser has some limitations:

On multiple relative clauses, the content of the clause is assume for all the identified subjects
E.g.: "cats, dogs and birds, that are pets, are nice
The parser assumes that PET is a relation for CAT, DOG and BIRD
but
E.g.: "the phone, that is a device, and the printer, that is a unit, have power plug"
phone ISA device / printer ISA unit (and also phone ISA device) / + phone and printer HAVE power plug

Of Clauses can process more than one concept, this could be confused with multiple subject sentence
"spoons made of iron or wood are used for stirring"
spoon HAVE iron / spoon HAVE wood / spoon%iron CAN stir / spoon%wood CAN stir
but
"teams of basket, companies and workers pay taxes"
the identified subjects are team%basket, team%company and team%worker instead of team%basket, company and worker

Sentences such as "cars have parts composed of some elements" or "keyboards have keys that contains letters"
are not processed due to its constituent tree complexity it should be processed as separated sentences
E.g.: "parts of cars has elements" or "keys of keyboards has letters"

This phase receives the language relations (in internal language format) extracted from the previous phase; therefore it processes every internal language code to insert them into the memory structures.

Question Answering

Obviously storing something that can't be retrieved does not make any sense.

The system allows the access to the concepts and their characteristics, using stereotyped questions as:

Affirmative questions → Key concept characteristic?:
to ask if a concept has a characteristic, it only allow affirmative questions, but obviously it can answer with negative response. E.g. "Is cat small?" or "Have dog tongue?".
Group questions → What key characteristic?:
to know which concepts has some characteristic. E.g. "What is nice?" or "What animals can fly?".
The symbol "?" is quite important when is input in the command line, is the key for be interpreted as question (information retrieval) instead of declarative sentence (learning).

Those stereotype questions it can be used with normal names and determinants for a most natural usage.
Which does not prevent to create unnatural questions, as "do cat has an legs?"; but for the system this is valid question (= have cat leg?)

Affirmative questions accepted graph:

() | separated list with the valid options
{} means not mandatory

Note: numbers could be write in word format, but it must be write in a unique "-" separated word, as for example "a-hundred-fifty-six".
      The maximum accepted number is less than a million (999.999).
Examples:

Can the cat run?
Do tigers have 2 eyes?
Is a mammal an animal?
Have the cat four legs?
are nails of legs of a cat nice?
Can walls be jumped?
Does the cat have four legs?
Is stock exchange complex?
Do cities have basketball teams?
are legs of animal cats an upper limbs?
   * when multiple nouns are referenced the indefinite article is important to split correctly the concepts
  if no indefinite article is indicated, then only the last noun is considered the characteristic
  is leg an upper limb? → concept = leg / characteristic = upper limb
  is leg upper limb? → concept = leg upper / characteristic = limb
is person a person? → note the self identity scenario.

Group questions accepted graph:

() | separated list with the valid options
{} means not mandatory

Notes:

In this kind of questions the object is searched in Deep Search mode even if is set to off.

Standalone negative questions are allowed, such as "what is not nice?"

In multiple conditions, if the "key" is not defined after a logical operator, the question assumes that the condition has the same "key" as the prior condition.
Therefore "what is big and large?" is equal to "what is big and is large?"

Examples:

asking for parent relations: which are a feline?
for features: what is red?
for attributes: What has legs?
numbered attributes: what have thousand cells?
asking for skills: what animals can run?
for affected actions: which animal can be hunted?
using multiple condition for features: what is a hard disk? / which hard disk is writable?
negative conditions: what animal has not wings?
multiple conditions: what is animal and savage and live or pet?
object guessing : what animal is a pet and has 4 legs and can eat or can hunt and has fur and is black?

Numbered attribute questions accepted graph:

() | separated list with the valid options
{} means not mandatory

Notes:

The answer to these questions it could be:
- "None" → in case the object has not relation with the attribute.
- "Any" → if the object has the attribute, but no numbers has been mentioned over the relation.
- "comma separated list of numbers" → the object has the relation with associated numbers, then return the list (ascending sort, no duplicates)

See here how is searched in memory.

Examples:

how many legs has the cat?
how many fingers does a hand of person have?

Interactions question accepted graph:

() | separated list with the valid options
{} means not mandatory

Notes:

The answer to these questions it could be:
- "None" → in case no interaction has been found or fit the conditions of the search.
- "comma separated list with the concepts than has the asked interaction.

Remind the negative particle affects only the interaction, not the relation, so the concept have to have a positive skill relation with the verb and the interaction having the negative tendency.

See here how is searched in memory.

Examples:

what do lizards eat?
what does the fox say?
what does a roar of lion king sounds?
what don't humans eat?
what can eat humans?
what can made an hydraulic Jack?

Interactions affirmative question accepted graph:

() | separated list with the valid options
{} means not mandatory
" " means a word wich lemma is the word between the quotes

Notes:

The answer to these questions it will be "Yes" if, and only if:
     1 - the concept has the action verb in their skill list with positive tendency.
     2 - the skill has the indicated receiver in their interaction list with positive tendency.
     3 - the receiver has the indicated action verb in their affected action list with positive tendency.
If any of these conditions failed then the answer will be "No".

As this kind of question has the same properties than the affirmative question, so the confirmation mechanism is also activated acting in the same defined way.

Negative particles are not accepted in the graph, becouse if the relation is negative then the answer will be negative.

Due to the attribute inheritance search, Of-clauses are not allowd as it's not necessary to ask.

See here how is searched in memory.

Examples:

can a cat jump the fence?
could screens show 8-bit images?
can dot printers print colors?

Considerations

1. The model is focused in simple declarative sentences, not in speeches, discussions or any other (large) text context dependent.

2. Stanford NLP core is quite sensible to language errors; therefore syntax errors must be avoided as much as possible.

3. As the verbal time doesn't have importance in the declarative memory, it's only necessary to get the infinite verb: Jumped or jumping → jump

4. Auxiliary or modal verbs doesn't have semantic value, therefore are discarded. Except "to be" and "to have" due to are descriptive ones.
  "The cat should have eaten": as "should" is modal and "have" is an auxiliary in this sentence, then the verb that acts as main verb is "eaten" (cat can eat)

5. At this phase doesn't interpret specific or proper "objects".
  Then personal pronouns and proper nouns will be translated into the concept that represents.
  New York, Barcelona or Japan → location
  We, John → person
  ONU, FBI → organization

  That translations are known as NER (named entity relation) and is provided by Stanford NLP core
  "Jimmy jumps"; Jimmy is a person → person can jump

6. The same sentence can have multiple constituent trees (that are valid different interpretations), but unfortunately how the sentence is formed (or how the Stanford NLP Core provides the tree) would obtain ones or others results.

  In this sentence the system don't detect the "cats has 4 legs" because "four" is outside of the noun phrase of the concept:
  The four legged cats: (ROOT (NP (NP (DT The) (CD four)) (NP (JJ legged) (NNS cats))))

  Instead, these sentences will catch it correctly
  the four legs of cats: (NP (NP (DT the) (CD four) (NNS legs)) (PP (IN of) (NP (NNS cats)))))
  cats has 4 legs: (S (NP (NNS cats)) (VP (VBZ has) (NP (CD 4) (NNS legs))))

7. Questions must be formulated in affirmative way.

   Anyway you can ask can receive negative answers. Instead of asking for "Are cat not big?" → "are cat big?" No

   For groups (object guessing), negative questions are allowed.

Check also the list of the considerations and limitations of the model and application.