Syntax 380L September 12, 2003 Phrase Structure Rules Structure within the NP 1 Definitions (1) a tree for ?the brown fox sings? A H H H H B H H H D the E H H F brown G fox C sings Linguistic trees have nodes. The nodes in (1) are A, B, C, D, E, F, and G. There are two kinds of nodes: internal nodes and terminal nodes. The internal nodes in (1) are A, B, and E. The terminal nodes are C, D, F, and G. Terminal nodes are so called because they are not expanded into anything further. The tree ends there. Terminal nodes are also called leaf nodes. The leaves of (1) are really the words that constitute the sentence ?the brown fox sings? i.e. ?the?, ?brown?, ?fox?, and ?sings?. (2) a. A set of nodes form a constituent iff they are exhaustively dominated by a common node. b. X is a constituent of Y iff X is dominated by Y. c. X is an immediate constituent of Y iff X is immediately dominated by Y. Notions such as subject, object, prepositional object etc. can be defined structurally. So a subject is the NP immediately dominated by S and an object is an NP immediately dominated by VP etc. (3) a. If a node X immediately dominates a node Y, then X is the mother of Y, and Y is the daughter of X. b. A set of nodes are sisters if they are all immediately dominated by the same (mother) node. We can now define a host of relationships on trees - grandmother, granddaughter, descendant, ancestor etc. Another important relationship that is defined in purely structural terms is c-command. (4) A c-commands B if and only if A does not dominate B and the node that immediately dominates A dominates B. c-command is used in the formulation of Condition C, a principle used to determine what a pro- noun may not refer to. CONDITION C (5) A pronoun cannot refer to a proper name it c-commands. Note that Condition C is a negative condition. It never tells you what a particular pronoun must refer to. It only tells you what it cannot refer to. In general, if a pronoun cannot refer to a proper name (despite agreeing in gender and number), you can conclude that the pronoun c-commands the proper name. The NO CROSSING BRANCHES CONSTRAINT (6) If one node X precedes another node Y, then all descendants of X must also precede Y and all descendants of Y. 2 How to grow trees Where do the trees that we use to analyze linguistic structure come from? In a way, they are just representations of facts that exist out in the world - the facts that we can discover using constituency test. So one way to make trees is by doing empirical work - taking a sentence, applying various constituency tests to the words in the sentence, and then drawing a tree based on the results of our tests. This empirical method is ultimately the only correct way to deduce ?tree structure?. However, in most cases, we can simplify things considerably by using Phrase Structure Rules. Phrase Structure Rules are rules of the sort X ! Y Z This rule says ?take the node X and expand it into the nodes Y and Z?. Alternately, going from right to left (or from below), it says ?if you have a Y and a Z next to each other, you can combine them to make an X?. 1 Phrase structure rules can be categorial i.e. rules that expand categories into other categories, or they can also be lexical i.e. rules that expand category labels by word (lexical items).2 A grammar can then be thought of as a set of phrase structure rules (categorial rules plus lexical rules). The categorial rules can be thought of as (part of) the syntax and the lexical rules as (part of) the lexicon. 2.1 Some Phrase Structure Rules for English (7) Categorial Rules a. S ! NP Modal VP b. VP ! V AP PP c. AP ! ADVP A 1Such phrase structure rules are called Context Free Grammars (CFG) and were invented by Noam Chomsky in 1956. A closely related model was used by Pa¯n. ini to describe the grammar of Sanskrit in around 500 B.C. 2Why must we have at least some lexical rules? 2 d. ADVP ! ADV e. PP ! P NP f. NP ! D N (8) Lexical Rules a. N ! girl b. N ! boy c. Adv ! incredibly d. A ! conceited e. V ! seem f. Modal ! must g. P ! to h. D ! that i. D ! this Some sentences these rules will generate: (9) a. This boy must seem incredibly conceited to that girl. b. This boy must seem incredibly conceited to this girl. c. This boy must seem incredibly conceited to that boy. d. This boy must seem incredibly conceited to this boy. e. This girl must seem incredibly conceited to that girl. f. This girl must seem incredibly conceited to this girl. g. This girl must seem incredibly conceited to that boy. h. This girl must seem incredibly conceited to this boy. How many more sentences will these rules generate? Optional constituents How do we handle cases like: (10) This boy must seem incredibly stupid. 2.2 Introducing infinity We know that human languages can contain sentences of arbitrary length. Consider (11) which stands for an infinite number of sentences. (11) He believes that he believes that he believes that he believes that : : : he ate pizza. So if all of human language is to be generated by a set of phrase structure rules, the relevant set of phrase structure rules should generate an infinite number of sentences. How can that be done? Let us try to analyze (11), starting with a more manageable (12). (12) He believes that he ate pizza. 3 We start with the following categorial rules: (13) a. S ! NP VP b. VP ! V S¯ c. S¯ ! COMP S d. VP ! V NP We need the following lexical rules: (14) a. NP ! he b. NP ! pizza c. V ! ate d. V ! believes e. COMP ! that Now we can generate (12). This is shown in (15). (15) S H H H H NP he VP H H H H V believes S¯ H H H COMP that S H H NP he VP H H V ate NP pizza But is (12) all that the rules in (13) and (14) will generate? How many sentences will (13) and (14) generate? 2.2.1 Overgeneration The rules in (13) and (14) will also generate sentences (see the structure below) like: (16) *He ate that he believes pizza. 4 S H H H NP he VP H H H V ate S¯ H H H COMP that S H H H NP he VP H H V believes NP pizza How can we constrain phrase structure rules so that such overgeneration does not take place? 3 Noun Phrases So far, we have seen two kinds of categories: word-level categories such as N, V, A, P etc. (somewhat imprecisely, words) and phrase-level categories such as NP, VP, AP, PP etc. (somewhat imprecisely, sequences of words which can ?stand on their own?). We will now investigate if these two kinds of categories are all we need a third category which lies in between words and full phrases. Consider the following NP: (17) the king of England We feel quite confident saying that ?the king of England? is an NP. What else can we say about its structure? There seems to be a lot of evidence that of England is a PP. It can be co-ordinated, shared in shared constituent co-ordination. It can also function as a sentence fragment and be preposed. (18) a. the king [ PP of England] and [ PP of the empire]. (coordination) b. He is the king, and she is the queen, [ PP of England]. (shared constituent coordination) c. A: Was he the king of Livonia? B: No, [ PP of England]. (sentence fragment) d. [ PP Of which country] was he the king? At this point we have two options: 5 (19) NP H H H H H D the N king PP H H P of NP England (20) NP H H H D the ?? H H N king PP H H P of NP England There is evidence from constituency tests that the sequence of words ?king of England? forms a constituent. ?king of England? can undergo co-ordination with another similar sequence. (21) Vivian dared defy the [king of England] and [ruler of the Empire]? ?king of England? can serve as the shared constituent in shared constituent co-ordination. (22) Edward was the last, and some people say the best, [king of England]. There is a proform that replaces sequences like ?king of England?. (23) The present [king of England] is more popular than the last one. So ?king of England? forms a constituent that excludes the. Thus we have evidence for the tree in (20). This evidence doesn?t actually rule out the tree in (19). It is not easy to rule out (19) on the basis of the discussion so far. However, an assumption that natural language structures only involve binary branching could be used to block structures like (19). 3.1 What kind of constituent is ?king of England?? In other words, what is the name of the node labeled ?? in (20)?? Let us assume that it is an NP. We find that this assumption is problematic in many ways. ?king of England? does not have the distribution of ?normal?/?full? noun phrases. Normal NPs can occur in subject position, in object position, and as a prepositional object. ?king of England? cannot appear in any of these positions. (24) a. subject: i. [The king of England] invaded several countries. ii. * [King of England] invaded several countries. b. object: i. I saw [the king of England] on the T yesterday. ii. * I saw [king of England] on the T yesterday. c. prepositional object: i. I didn?t give any money to [the king of England]. 6 ii. * I didn?t give any money to [king of England]. Consider the tree for ?the king of England? under the assumption that ?king of England? is also an NP. (25) NP H H H D the NP H H N king PP H H P of NP England From this tree, we can read of the phrase structure rules involved in building it. They are shown in (26). (26) a. Categorial Rules: i. NP ! D NP ii. NP ! N PP iii. PP ! P NP b. Lexical Rules: i. D ! the ii. N ! king iii. P ! of iv. NP ! England Note in particular the categorial rule (26a.i). It has the unusual property that it expands a node label into itself. Such rules are called recursive and this phenomena is called recursion. 3 So we can go from NP to [D NP] to [D D NP] to [D D D NP] and so on. In principle, using the rules in (26), we can generate NPs like those in (27). (27) a. * the the king of England b. * the the the king of England c. * the : : : the the the king of England Now, it is very clear that none of the NPs in (27) are good noun phrases in English. From this we can conclude that the categorial rule (26a.i), which is the source of the recursion, cannot be correct. So: ?king of England? cannot be an NP and yet ?king of England? is a constituent of some sort. Let us call nominal constituents that are bigger than words but still not full phrases N¯ (or N0 or N-bar). Our tree now becomes: 3We have already seen a case of recursion, though there the recursion was in two steps. 7 (28) NP H H H D the N0 H H N king PP H H P of NP England NPs are somtimes called N-double bars or N00. N are sometimes called N0. 4 Complements and Adjuncts Consider the phrase-structure rules responsible for generating (28) 4: (29) a. NP ! D N0 b. N0 ! N PP c. PP ! P NP We see that D combines with an N0 to its left and forms an NP. Similarly P combines with an NP to its left and forms a PP. Likewise, (29) says that an N combines with any PP that follows it (i.e. any postnominal NP) and forms an N0. But is this really the case? Do the PPs in (30a, b) have the same relation to the N? (30) a. a student [of Physics] b. a student [with long hair] It seems not. Consider the following pattern: (31) a. i. He is [a student of Physics]. ii. = He is [studying Physics]. b. i. He is [a student with long hair]. ii. 6= He is [studying long hair]. PPs like ?of Physics? are called complements, while PPs like ?with long hair? are called adjuncts. Corresponding to this difference in terminology, a structural difference is also proposed. This is shown in (32). (32) 4From now on, we will only consider the categorial rules. The lexical rules are straightforward. 8 N00 H H H H H Determiner N0 H H H H N0 H H N Complement Adjunct In terms of phrase structure rules this is: (33) a. N00 ! D N0 (Determiner Rule) b. N0 ! N0 PP (Adjunct Rule) c. N0 ! N PP (Complement Rule) The rules in (33) make a prediction - if an NP contains both a complement PP and an adjunct PP, the complement PP should precede the adjunct PP. This prediction turns out to be true. (34) a. the student [of Physics] [with long hair] b. * the student [with long hair] [of Physics] 4.1 Optional Constituents of the Noun Phrase Do all NPs have to contains a determiner, a noun, a complement PP, and an adjunct PP? Well, they have to contain an N, otherwise they wouldn?t be NPs. What about the others? Consider the rules in (33). If you wanted to make an NP, would it be necessary to apply, the Adjunct rule? You could take an N and a complement PP and make an N0. Then you could combine the N0 with an adjunct PP to make another N0. You could, but you don?t have to. You can now just combine your adjunct-less N0 with a D on its left to make an NP. So, NPs don?t have to contain adjuncts. In other words, the adjunct rule is an optional rule. Still, the rules in (33) insist that every NP must have a determiner (the Determiner rule) and a complement PP (the Complement rule). This is, however, just false. (35) a. the student b. the student with long hair (35a) is an NP without a complement PP, (35a) show that an NP without a complement PP can still take an adjunct PP. How can we modify our phrase structure rules to handle these case? For this purpose, we will introduce new terminology: (A) means that A is optional. So we can now change our complement rule from (36a) to (36b). (36) a. N0 ! N PP (Old Complement Rule) b. N0 ! N (PP) (New Complement Rule) We also find optionality of determiners cf. (37a-e). 9 (37) a. cheese from Greece b. students c. students with long hair d. students of physics e. students of physics with long hair However, this optionality is lexically determined i.e. it only works for certain nouns - noncount nouns and plural count nouns but not singular count nouns.5 (38) a. * Student likes pizza b. * Student with long hair likes pizza For such nouns, we can modify the determiner rule in the way we modified the complement rule in (36). (39) a. NP ! D N0 (Old Determiner Rule) b. NP ! (D) N0 (New Determiner Rule) However, we have to think of a way to block the NPs in (38) from being generated. 4.2 Non-branching Phrases Consider the new complement rule: (40) N0 ! N (PP) (New Complement Rule) This rule is really equivalent to the following two rules: (41) a. N0 ! N PP b. N0 ! N (41a) is nothing new. (41b) is definitely new. It tells us something unexpected. According to (41b), student in ?the student? is both an N0 as well as an N, while student in the student of Physics is only an N, not an N0. Similarly, student in ?the student with long hair? should be both an N0 as well as an N. We can check to see if these predictions are true. The test we will use is substitution by the N0 pro-form one. (42) a. The [student] with long hair is dating the one with short hair. b. This [student] works harder than that one. c. * The [student] of chemistry was older than the one of Physics. What can co-ordination tell us here? 5How can we be sure that the word sequences in (37) are NPs? 10 4.3 A bit more on N0 Both (43a, b) are responsible for the creation of N0?s. (43) a. N0 ! N0 PP (Adjunct Rule) b. N0 ! N PP (Complement Rule) How can we be sure that the node created by the complement rule isn?t N-bar1 and the node created by the adjunct rule N-bar2? Again by constituency test: we know that only like categories can be co-ordinated and we find that N0 created by the two different rules can be co-ordinated. (44) the [ [students of Chemistry with long hair] and [professors of Physics]] In addition, the pro-N0 one can refer to N0s created by either rule. (45) a. Which [student of Physics]? The one with long hair? b. Which [student of Physics with long hair]? That one? Hence we can conclude that the ?output? of both the rules is indeed one kind of node, which we call N0. 11 l2.dvi
Want to see the other 11 page(s) in phrase_structure_trees.pdf?JOIN TODAY FOR FREE!