Chapter 3: Methodology

Track 4 on Linguistic Variation in Hip Hop: Variable Use of African American Vernacular English by New York Rappers Jay-Z and Nas

1 viewer

Chapter 3: Methodology Lyrics

3. METHODOLOGY
This section will outline the study’s research design and describe how it was conducted. It will also, using statistical tests and quantitative analysis, evaluate the representativeness of the data collected and the reliability of the research methods as a whole
Section 3.1 outlines the motivation for selecting the two albums which will form the basis for the primary data, while section 3.2 details the nature of this data. Sections 3.3 - 3.8 explain the process of data collection and extraction: how the variables were selected, how they were coded for (according to their variable context and the principle of accountability (Labov 1972)), how tokens were extracted, and how tokens were excluded. Section 3.9 outlines the chi-square test used to determine the statistical significance of results

This study takes the lyrics of two seminal Hip Hop albums as its primary data. These albums are Jay-Z’s Reasonable Doubt and Nas’: Illmatic:

3.1. Selection of Artists
Through the selection of artists used in the study, it was possible to control certain variables. Firstly, geographical location provided too much scope for variation for this study to deal with and also caused problems regarding a comparable variety of English to AAVE (Wolfram & Schilling-Estes 1998). Therefore, it was preferable to select two artists born and raised in a similar area. Secondly, this study is not concerned with variation across different age groups, so two artists of a similar age were selected. Thirdly, artists of the same gender were chosen, in order that gender would not constrain variation in this study. Fig.3 (below) places both rappers in these criteria

It was also preferable to select two artists whose music is generally well-respected, and is representative of the art form that is Hip Hop. The hypothesis that AAVE is used to construct the identity/reflect the socio-historical background of the Hip Hop movement can only be tested against the lyrics of artists that fully represent this movement. The general consensus in the Hip Hop Community that Jay-Z and Nas fit this criterion is backed up by Hodges Persley (2011). In this way, any valid statements about the AAVE speech of Jay-Z and Nas will hopefully apply to the rest of the Hip Hop community. Of course, every speaker is different, and a multitude of social and linguistic factors affect their dialectal choices, but the more representative the speakers are of Hip Hop culture, the more valid any generalisations made about their speech will be

Another, less vital (but no less interesting), reason for the selection of Jay-Z and Nas is that a public debate exists as to who’s debut album is better, and as to who is the better rapper as a whole. This debate is not linguistic, and as such as no real relevance here. However, it brings an extra dimension to the value of this research: a linguistic account of the extent to which each artist represents his African-American cultural heritage may have a bearing on who is considered to be the most ‘real’

3.2. Sourcing the Primary Data
This study uses the lyrics from each of the artists’ debut albums (Reasonable Doubt – Jay-Z and Illmatic - Nas) as its primary data. Listening to the albums and transcribing the lyrics would have been a laborious and error-prone process. Instead, the lyrics were taken from Rap Genius (www.rapgenius.com). The website allows users to upload the lyrics to various rap songs themselves, which are corrected/improved upon by other users and accepted/rejected by the original user. These lyrics are then moderated by a team of editors to ensure that all transcriptions are correct. This allows for much greater accuracy than any one person could achieve. The website also has its own style guide, meaning that all submissions are transcribed in a uniform fashion. Another advantage of using Rap Genius is that explanations behind the lyrics can also be uploaded and improved upon/corrected by the users. This meant that the semantic content of the lyrics (which, in some cases, affected data collection) could be determined with ease, and was particularly useful in determining the meaning of slang words/phrases
However, using a user-controlled website such as Rap Genius has obvious limitations. The users themselves are by no means professional linguists, and their transcriptons are by no means always correct. Bearing this in mind, the lyrics were examined with great scrutiny to check for errors. Very few were found, and those that were were edited by myself. Again, this method is not fool-proof, but without direct transcription from the artists themselves I feel that this was the best possible method. In the case of Reasonable Doubt, it was sometimes possible to cross-reference lyrics with the ones printed in Jay-Z’s autobiography Decoded (Carter 2010), which are explained in detail by Jay-Z himself

3.3. Linguistic Variables
A number of features were unsuitable for use as variables as they were specific to one geographical area (eg: use of [oɪ] in place of SAE [oʊ] is exclusive to the Southern States of the USA (Green 2002)), so these were discounted. Also, fine-graded phonological distinctions between vowel sounds (for example, relative backing of the nucleus in /ai/ and relative fronting of the back vowels, as noted by Wolfram & Thomas (2002: 170)) were not coded for. This is for two reasons: firstly because they are difficult to distinguish in the rapid speech stream of the rap itself, particularly in light of the background music; and secondly because vowel sounds are often changed to fit a certain rhyme scheme and therefore may not be representative of natural AAVE speech
The extant literature identifies a wide variety of features that are ‘typical’ of AAVE speech. In order to select a manageable number of only the ‘most typical’ AAVE features, this study cross-referenced a list of eight features in the work of Wolfram & Schilling-Estes (1998: 171) with the findings of the rest of the extant literature. They identify eight features (adapted from Fasold (1981)) that they consider unique to AAVE (see fig. 4, below). According to Wolfram & Schilling-Estes: “Based upon a careful review of research studies up to that point, Fasold concluded that the following structures of AAVE [see fig.4, below] were the best candidates as unique features of AAVE” (1998: 171)

Their analysis that these eight features represented “the best candidates as unique features of AAVE” (1998: 171) proved consistent with the literature, and having further adapted Fasold’s list in light of more recent research, this study is inclined to agree, and takes the same eight features as the coding variables for data collection:
1. Devoicing of voiced stops in stressed syllables (also cited by Bailey & Thomas (1998), Zeigler 2009))
2. Present tense, 3rd person –s absence (also cited by Wolfram & Thomas (2002: 171), Wolfram & Schilling-Estes (1998: 272), Dillard (1973: 40), Baugh (1983), Zeigler 2009))
3. Plural –s absence (also cited by Harrison (1975: 158), Labov (2012), Baugh (1983), Zeigler 2009))
4. Use of remote time stressed been (also cited by Dillard (1972), Rickford (1975), Baugh (1983), Labov (2012)), Bailey & Thomas (1998), Labov (1998), Zeigler 2009))
5. Absence of possessive –s (also cited by Harrison (1975: 158), Labov (2012: 147), Baugh (1983), Zeigler 2009))
6. Reduction of final consonant clusters (also cited by Wolfram & Thomas (2002), Zeigler 2009), Wise (1975), Labov (2012: 144), Baugh (1983), Smitherman (2000), Mufwene & Rickford (1998), Bailey & Thomas (1998))
7. Copula/aux deletion involving is foms (also cited by Wolfram & Thomas (2002), Labov (2012: 48) Dillard (1992: 21), Zeigler 2009), Wolfram & Thomas (2002), Walker (2000), Rickford (1991, 1998), Baugh (1983), Smitherman (2000), Mufwene & Rickford (1998), Bailey & Thomas (1998))
8. Habitual be (also cited by Labov (1969, 1998, 2012), Dillard (1972), Zeigler 2009), Fasold (1969, 1972), Stewart (1969), Fickett (1970), Wolfram (1969), Baugh (1983), Smitherman (2000), Mufwene & Rickford (1998), Bailey & Thomas (1998)

For the purposes of this study, defining a carefully selected coding schema of AAVE features is of critical importance: the chosen features must be the most typical of AAVE speech in order that the results may prove significant. However, Rickford (1999: 12) points out that “AAVE is not simply a compendium of features, but the integral whole which Brown evocatively called 'Spoken Soul'". Reducing AAVE to a small number of specific features is to neglect so many cultural aspects of AAVE (tonal semantics, call-response communication, signifying and so on). The scope of this study, unfortunately, is not wide enough to take the above into account, therefore the eight features selected represent the best case scenario for neatly characterizing AAVE speech

3.4. Circumscribing the Variable Context
Before analysis of AAVE/SAE variation can begin, it must be established which of the features listed above are subject to variation, and “precisely how and where in the grammatical system a particular linguistic variable occurs” (Tagliamonte 2006: 86). The extant literature (see above) cites the following as being variable in the grammatical system of AAVE

1. voiced stops in stressed syllables
AAVE: Good (realised as [ɡɔʈ]) me either
(Jay-Z Reasonable Doubt)
SAE: I’ll make your block infrared hot
(Jay-Z Reasonable Doubt)
2. present tense, third person –s
AAVE: Jay-Z rise Ø ten years later
(Jay-Z Reasonable Doubt)
SAE: When it boils to steam
(Jay-Z Reasonable Doubt)
3. plural –s deletion on the general class of noun plurals, where a.) –s would result in a syllable-final cluster and b.) number reference is made clause internally
True in the game, as long as blood is blue in my vein Ø
(Nas Illmatic)
SAE: Laughing at baseheads
(Nas Illmatic)
4. use of remote time stressed ‘been’ to mark an action that took place a long time ago and is still relevant
AAVE: I been known him a long time
SAE: I have known him for a long time
(Fasold 1981)
5. possessive –s absence
AAVE: Man Ø hat for Man’s hat
(Fasold 1981)
SAE: Or caught by the devil’s lasso
(Nas Illmatic)
6. reduction of final consonant clusters when followed by a word beginning with a vowel
AAVE: Load up the mic’ and bust one (realised as [ˈbəs ˈwən])
(Nas Illmatic)
SAE: Niggas deceased or behind bars
(Nas Illmatic)
7. copular and auxiliary deletion involving is forms
AAVE: she Ø a snake too
(Nas Illmatic)
SAE: It’s like that
(Nas Illmatic)
8. use of habitual be
AAVE: Be having dreams that I’m a gangsta
(Nas Illmatic)
SAE: Every afternoon, I kick half the tune
(Nas Illmatic)

Variable number three has been further adapted, given distinctions made by Poplack & Tagliamonte (1994), who observed that AAVE speakers only delete plural –s where the –s would result in a syllable-final consonant cluster, and where reference to the plurality of the noun is made clause-internally. Adapting the variable in this way will mean that inclusion of plural –s deletion that does not fit these criteria will not sway the results

3.5. The Principle of Accountability
This study was conducted in accordance with the ‘Principle of Accountability’, which states that linguists are duty-bound to “report values for every case where the variable element occurs in the relevant environments as we have defined them” (Labov 1972: 72). Failure to do so would result in the linguist being unable to account for their results as being observed as tokens of natural speech. Further to this, in order to achieve full accountability, values must also be recorded for the environments in which the token could have occurred, but did not: ‘Ø’ is therefore effectively considered a variant. In order to be consistent with this ‘revised’ principle of accountability, the study also recorded these Ø variants, aiming to “report the number of occurrences of a feature out of the total number of cases in which it could have occurred” (Rickford 1986: 41)

3.6. Extracting Linguistic Variables from the Primary Data
Having established a list of variables (section 3.1 above) and a corpus of primary data from Rap Genius (section 2 above), data extraction could begin. The data was collected manually: equipped with a list of eight variables (see section 3.4, below) and hard copies of the lyrics of both artists, it was possible to annotate the lyrics with each variant produced whilst listening to each album in turn. Firstly, the AAVE variants of each feature were highlighted, before a second pass was made in which the SAE variants were highlighted. The AAVE and SAE variants (numbered as in fig. 5, above) were highlighted in different colours, and phonetically transcribed where necessary. This process was repeated twice more in order to ensure that no variants were missed. Often, a great deal of scrutiny was necessary to establish exactly which variant was being produced, and in some cases this was actually impossible (see section 3.8 for discussion of ‘Neutralisation Contexts’)
Once all tokens had been extracted in this way, they were tabulated, at which point percentage frequencies of each variant could be calculated (see figs. 6 & 7, below). The tokens (both the SAE and the AAVE variants) of each variable were added together to give a total frequency for each. The number of AAVE variants was then divided by this total, to yield the percentage of AAVE tokens out of all possible occurrences - as per the ‘Principle of Accountability’ (see section 3.5, above)

3.7. Excluded tokens
When collecting the data it was necessary to remove certain tokens from the analysis for a variety of reasons. One such case was where tokens occurred in a ‘neutralisation context’. Under these circumstances, “unambiguous identification of the variant is compromised” (Tagliamonte 2006: 91), and so the token must be excluded from the analysis. One frequently occurring example concerned plural –s deletion on the general class of noun plurals. Where a plural word ending in an alveolar sibilant ([s], [z]) precedes a word beginning with an alveolar sibiliant it is difficult, if not impossible, to discern whether or not the plural –s has been deleted. For example, the token: None of my friends speak (Jay-Z Reasonable Doubt) was excluded on the basis that it is impossible to unambiguously identify deletion or non-deletion of plural –s due to its context (the following syllable has an initial s)
Tokens that were produced in the context of reported speech were also omitted. This is because reported speech does not reflect the natural dialect of the speaker, as they are quoting somebody else, who may or may not speak differently (Tagliamonte 2006). For example, the copula/auxiliary non-deletion involving is forms (SAE variant) in the line: Fuck ‘rap is real’, watch the herbs stand still (Nas Illmatic) must be excluded from analysis because it is found in reported speech
After data collection, there were no tokens whatsoever of variable 4 (Use of remote time stressed ‘been’ to mark an action that took place, or a state that began, a long time ago and is still relevant). Therefore it would be impossible to make any claims about the speakers’ use of that variable, and it was excluded. For this reason, there will be no further comment on the use of remote time stressed ‘been’ in this study. This seems a surprising result, given that it is cited in the extant literature as a unique feature of AAVE (see section 3.3, above). However, it is in fact consistent with Wolfram’s conclusion that “the use of remote been in urban areas appears to be receding” (Wolfram 2004: 120). Perhaps if this is a feature of rural speech, then it is no surprise that it is eschewed by urban artists, to whom their regional identity is of the utmost importance
In the case of the remaining seven variables, tokens of either the AAVE or SAE variants were found. It is these seven variables that provided the basis for analysis
3.8. Chi-Square Analysis
A Chi-Square Test must be performed in order to eliminate the possibility that the observed results occurred by chance, and not as a result of some independent variable. In order to do this, a significance level (p) is chosen. In this case it is 0.05, meaning that if it is with >95% certainty that the result could not have occurred by chance, the null hypothesis must be rejected. Having decided that the significance level (p) is .05, the degrees of freedom (df) was calculated from the data. Because there were only two data entries per variable, the data from all seven variables was entered into the chi square test as an aggregate. This is less reliable, but it does prove that the results are statistically significant because the p value for the results was smaller than the critical value, rendering the results statistically significant on aggregate

About

Have the inside scoop on this song?

Q&A

Find answers to frequently asked questions about the song and explore its deeper meaning

Linguistic Variation in Hip Hop: Variable Use of African American Vernacular English by New York Rappers Jay-Z and Nas

Lewis Lister

Credits

Chapter 3: Methodology

Chapter 3: Methodology Lyrics

About

Q&A

Genius is the world’s biggest collection of song lyrics and musical knowledge