NON-CONVENTIONAL ENGLISH LANGUAGE USE IN THE WRITING AND SPEECH OF THAI ACADEMIC WRITERS: A PRELIMINARY STUDY

This paper reports a preliminary study analyzing non-conventional language use in English manuscripts written by Thai academics and edited by the first author of the paper. The purposes were to classify the non-conventional language uses identified by the editor and to establish if there were common patterns of errors among the authors. The analysis identified the editor’s reason for suggesting each change to a manuscript and a nomenclature was constructed based on 15 language structure categories plus five non-structure categories. The writers of the manuscripts sampled were also interviewed in English and a sample of their speech was analyzed based on the nomenclature in respect of structural errors. The numbers of each type and category were compared across writers, and within writers between their writing and speech. High and significant correlations between writers and moderately high and significant correlations within writers were found. The findings suggest directions for further study which may offer valuable insights into whether the use of language for academic purposes and its use as a spoken interpersonal medium are related or whether the two skills are acquired differently.


INTRODUCTION
This study arose from work conducted by the first author (the editor), reviewing and editing the language content of manuscripts written by Thai authors before submission to journals published in English.During this work, the editor observed that although these writers could produce extended texts in a generally acceptable academic writing style, they remained prone to errors in grammar that did not generally appear to be in complex English structures, but related to elementary aspects such as verb inflection, noun pluralization and the article system.
Therefore, the initial intention of this work was to study the distribution and causes of structural errors, as well as other common forms of non-conventional language use (NCU) noted in the manuscripts and to consider the underlying reason for each change suggested.The term NCU covers instances where in editing a manuscript, the editor recommended a change based on any factor other than the information content of the paper.The term thus covers both language structurebased 'errors' (i.e.morphology or syntax), and non-structure NCUs, that is uses of language not conforming to accepted patterns of lexical use or rhetorical style in Vol.4(2)(2016): [251][252][253][254][255][256][257][258][259][260][261][262][263][264] general English or within the academic writing genre.The term structure NCU is also used, particularly in relation to the analysis of speech samples where only structural 'errors' were considered.
The work reported herein was a preliminary study conducted to check the feasibility of a broader study and to prepare and refine the methodology, including developing a nomenclature of NCUs, as well as identifying practical and theoretical questions which a wider analysis of the corpus should address.In order to conduct the study, a small sample from the corpus was selected consistent with the balance of author genders and academic domains making up the overall corpus.The authors selected were approached and their agreement to participate obtained, following which each revision suggested at the time of editing their paper was analyzed and coded according to whether it was due to a structural or nonstructural cause (as defined above).In addition, the authors of the manuscripts were interviewed and samples of their spoken English were analyzed in respect of the structure NCUs they contained and the data from the sample of the author's speech was compared with the corresponding data from their manuscript.
Unlike other South East Asian nations with traditional links to European languages, Thailand has never experienced colonization by a European power and Thai has always been the language of government, education and social interchange.Although according to Crystal (2003) there are more than 17 million speakers of English as an additional language in Thailand (out of a population in 2015 of almost 70 million), personal experience and anecdotal reports suggest that that figure overstates the number of people who regularly use the language or are capable of doing so, and Thailand often ranks low in surveys of English ability (e.g.The Nation [2013] reported a survey of English proficiency in 60 non-English speaking countries in which Thailand ranked 55 th ).
Nevertheless, successive governments have sought to encourage Thais to acquire English, which is a compulsory subject in both the Thai National Curriculum (The Ministry of Education, 2008) and the university entrance examination system.However, the vast majority of Thai school students learn in Thai with the penetration of English as the language of learning restricted to private schools outside the public education system, and English programs in government schools where tuition is partly in English, partly in Thai.
At university, students are required to study English before graduating at undergraduate level and therefore take compulsory foundation English courses and may also take optional English courses.Additionally, most universities now offer undergraduate courses taught in English and there are a small number of universities offering exclusively English medium courses.However, most Thai undergraduate students learn in Thai and are only exposed to English in a handful of courses.Only at masters and doctoral levels does English become a determining factor in education because students must pass an English proficiency test, access both textbooks and journals in English, and also may need to publish their own research in an English journal before graduating.

LITERATURE REVIEW
In this preliminary study a data-driven, grounded approach was adopted, broadly guided by Strauss and Corbin (1998), who suggest that where a rich data source is available, widely reviewing literature prior to collecting and analyzing data should be avoided to preclude prejudging issues and simply following previous work in the field.Nevertheless a number of theoretical and practical areas are clearly important to the study and the review that follows identifies the conceptual framework within which the study was conducted.
The study's overall context is academic writing, more specifically English for research publication purposes (Cargill & Burgess, 2008) which is itself a part of what has come to be known as English for academic purposes (EAP).Gillett (1996) situated EAP within the wider field of ESP because it is goal-directed, based on needs, taught to adults rather than children and involves specialist language.While Gillett stressed that EAP covers all uses of English in academic activity, he identified writing as the most important aspect of EAP, and highlighted accurate grammar and language forms as well as the formal language used in the genre as being crucial components of EAP.He noted however that many people with a need for skills in EAP do not have English as their first language.Hyland (2006) also noted that teachers of EAP are not necessarily native English speakers.
The study of errors in second language acquisition (SLA) and in particular contrastive analysis (CA) and error analysis (EA) have a rich literature which is not reviewed here other than to highlight the issue of the influence of a learner's mother tongue (or L1) when learning a second language (L2).This issue underpinned CA, which sought by comparing two languages, to identify where they differed and thereby to predict where difficulty would be encountered by L2 learners because of the transfer of language features from their L1 (Lado, 1957).However, as CA gave way to EA as the dominant paradigm, the influence of the mother tongue in SLA was challenged, (e.g.Corder, 1967;Dulay & Burt, 1974a) and errors came to be viewed as a necessary part of the development of an idiosyncratic interlanguage (Selinker, 1974).Later, however, the influence of the learner's L1 came to be recognized by many of its earlier detractors (e.g.Corder, 1994;Gass & Selinker, 1994) as being a significant factor in SLA.
There have been a number of CA and EA studies in Thailand looking at potential and actual areas giving rise to problems for Thai learners of English.Typical of CA studies is Nathong (1988), who noted significant similarities between the structure of the two languages in basic sentence patterns but also highlighted some important differences.Of these the following areas could potentially give rise to mother tongue effects in the categories included in the nomenclature used to code the NCUs found in the participants' work in this study: articles/determiners, Vol.4(2)(2016): 251-264 noun pluralization, possessives, prepositions, pronouns, verb form, word form and word order in noun phrases.
EA-based studies in Thailand have tended to concentrate on texts produced specifically for the study rather than authentic material.Recent exceptions include Sereebenjapol (2003), who looked at science-related theses published in English at a university in Bangkok, and Ayurawatana (2002), who analyzed research proposals submitted at a Thai university.The studies found global errors as well as local grammatical errors with L1 interference being the main cause cited with some errors in areas of English regarded by the researchers as being complex or associated with the order of acquiring language features, mentioned below.The only study traced considering articles written by Thai academics (Jaroongkhongdach, Todd, Keyuravong, & Hall, 2012) did not consider errors in the language used by the authors but concentrated on rhetorical features, attributing the comparatively low quality of the Thai research papers to conflicts between national research policies and academics' motivations for conducting research, as well as national cultural values.
Within the field of SLA, two related issues that have received previous research attention to which the present study might be relevant are the order in which language is acquired and the age at which learners begin learning.The order of acquisition hypothesis proposed by Dulay and Burt (1974b) suggested that there is an invariant order in which English morphemes are acquired which does not depend on learner age.This hypothesis was later extended and refined by Dulay, Burt, & Krashen (1982), who proposed that the acquisition order applied to both children and adults and to both writing and speech.They placed language features into four groups with the later acquired features (perfect auxiliary and past participle) being placed in group IV and case and word order being placed in group I as the earliest learned features.They did not include determiners or articles in the grouped order, but in other work cited (Bailey, Madden, & Krashen, 1974 as cited in Dulay, Burt, & Krashen, 1982), it was suggested that they were among the earliest features acquired.
The order of acquisition hypothesis and the influence of age on learning were later commented on by Johnson and Newport (1989) who, working within a neuro-cognitive framework, tested the ability of Asian immigrants to the USA to detect structural errors through their neural responses.Their findings throw doubt on the suggestion that L1 learning and SLA are comparable, detecting differences in brain responses to sentences containing structural anomalies depending on the age of first exposure to English.They detected a linear relationship between the age at which learners began learning and their ultimate performance, with later learners experiencing greater difficulty in detecting structural errors.They also found that the relative difficulties experienced in different areas of structure were correlated with age of first exposure, supporting Dulay and Burt's (1974b) order of acquisition hypothesis and concluded that the effects were more significant than those of the L1 on SLA.Most notably, difficulties Vol.4(2)(2016): [251][252][253][254][255][256][257][258][259][260][261][262][263][264] with the use of determiners and noun pluralization produced the highest correlations with age of first exposure, while basic word order and the use of the ing morpheme produced the lowest.

The corpus
The corpus from which the sample analyzed was drawn consisted of around 130 manuscripts written by academic authors, most of whom were Thai.These papers had all been reviewed and edited for their language content by the first author (the editor) since 2010.This editing was conducted entirely separately from the analysis carried out in the study and took place at least one year prior to its commencement.Most of the papers were submitted by their authors to the publication clinic at the graduate school of a major university in southern Thailand, who then sent the papers to the editor for review.The manuscripts were written for publication in English academic journals or in some cases to support a presentation at an international conference with publication in its proceedings.
The authors of the papers which made up the corpus were drawn from all five campuses of the university and represented more than half the faculties within them, covering a range of academic disciplines, including science, engineering, IT, the humanities and medical fields such as nursing, and dentistry.The corpus also included a small number of papers reviewed by the editor from authors at other institutions.Before commencing the study, the permission of the graduate school to include the papers in this research was obtained, as was the informed consent of all the authors whose work was included in the study.

Vol. 4(2)(2016): 251-264
The authors were then interviewed to allow an extended sample of their spoken English to be collected.The interviews were semi-structured and before the interview, a pre-interview questionnaire was sent to each participant requesting background information, on the basis of which an interview guide was constructed, which was used as an aid to conducting the interview.The interviews were ostensibly conducted to obtain the authors' personal and demographic details as well as details of their experience of learning English and publishing articles in journals but the main purpose was to provide material which could be later analyzed for structural NCUs that were then compared to the structural NCUs in their writing.However, other than the information included in the following paragraph regarding the participant's background information, the content of the interviews is not reported in this paper.
All the interviews conducted at the participant's workplace were recorded and lasted between 42 and 73 minutes.The four authors for whom data are included in this paper (A, B, D and E) were from four different academic domains: engineering, life science, IT and the humanities with no two participants working at the same location.Two were female, two were male.They had all undertaken 12 years of elementary and high school education in Thailand.Participants B, D and E all learned in government schools and commenced learning English at age 10 or 11.Participant A, however, attended a private school and began learning English at age 7.All learned English throughout their secondary education.All had gained bachelors and masters degrees at universities in Thailand, two in the South of Thailand, and two in Bangkok.Only one had majored in English.Two had gained PhDs overseas, one in the USA, one in China although English had been the language of instruction used.Their ages ranged from 35 to 45 and none came from privileged or high economic status backgrounds, three having been born in urban areas and one in a rural setting.All had undertaken secondary education in urban settings in southern Thailand, two in their home cities, two in cities distant from where they were born.Once the papers had been analyzed and the interviews conducted, the data were analyzed as described in the following section.

Data collection and analysis
The data collected were the numbers of instances of NCU classified according to type.Data were collected both from the manuscripts and speech samples of each participant.Initially, each instance where the editor had suggested an amendment to the authors' manuscript was identified manually by recording codes on a copy of the edited manuscript, denoting his reason for suggesting the amendment.The codes were generated during the coding process and were descriptive of the reason identified.The nomenclature was further refined by grouping together the codes into five areas, (language) structure, cohesion, (rhetorical and academic) Vol.4(2)(2016): 251-264 style, lexical use and information content, with a further miscellaneous category covering amendments not falling into one of the mentioned categories.
Since the initial aim of the study was to investigate the problems which these Thai academic writers had in producing structurally accurate English, the structure category was further divided into 15 sub-categories as shown in Tables 1  and 2. The process of constructing the nomenclature was progressive, codes being added as necessary during the coding of each manuscript.A careful record was kept of the use of codes within manuscripts and where necessary, classifications in earlier analyzed papers were amended in line with later amendments.Overall, by the end of the analysis of the papers written by the participating authors, the nomenclature extended to 220 codes.
At the end of the coding of each manuscript, the number of instances of each code was recorded and totals for each sub-category and category determined.These figures were then compared across manuscripts and correlations calculated.The total numbers of NCUs were also compared with the total numbers of words in each manuscript and an NCU per 100 words calculated (hereafter expressed for convenience as NCU%).In total, the four manuscripts included in the analysis amounted to around 16,000 words and the speech samples drawn for analysis from the interviews to 4,000 words.
The NCU data from the speech samples were collected following the transcription of the interviews.In order to sample the speech data, a randomly selected continuous section of the participant's speech of approximately 1,000 words was selected and to render this comparable with the written data, common features of speech not present in written work were disregarded and incomplete utterances treated as being correct so far as uttered.The analysis of the NCUs in the speech data was restricted to structural NCUs with no consideration of lexical issues, style, or information content.The structural NCUs were categorized using the same 15 sub-categories identified in the coding of the manuscripts to produce a snapshot of the problem areas that the participants experienced in their everyday speech, capable of comparison with the analysis of their writing.
Finally, prior to interviewing each participant, the categories and subcategories of NCU that accounted for more than 5% of the NCUs from their manuscript were identified (in all cases either six or seven categories/subcategories) and towards the end of each interview the participant was asked to order those areas according to how difficult they regarded them.After the interview the participant's order was compared with the actual order based on the number of NCUs in their manuscript and a Spearman rank order correlation coefficient derived.

FINDINGS
Table 1 shows the NCU% for all four participants, which ranged between 9.2 and 20.5%, with the structure NCUs ranging from 3.4 to 9.5%.Whilst the number of NCUs detected varied, there was considerable consistency across the four authors in the types of NCU, with correlations at the code, sub-category and category level all significant at p<0.05 or higher.The correlations at individual code level ranged between 0.664 and 0.844 and were all significant at p<0.001 (df=218).At subcategory level they ranged between 0.599 and 0.884 and were all significant at or above p<0.01(df=13) while at the category level the range was between 0.865 and 0.973, the significance level in all cases exceeding 0.05 (df=4).
This consistency of performance suggests that there may be common factors influencing the accuracy of the writing of the four participants.This is discussed further in section 5 below.The overall pattern of the distribution of NCUs was for structure to account for around half (range, 44.5-59.5%)with lexical and style NCUs each accounting for around 20% (ranges: 12.1-25% and 11.5-26.1%,respectively).Within the structure category, articles was consistently the largest sub-category (range 25.3-35.3%)with prepositions, nouns or verb related problems (tense, form and misc.)being the next three largest areas (ranges: 9.1-20.1%,4.1-16.9%and 5.4-28.3%,respectively).The three sub-categories producing the least NCUs were possessives, adverbs and agreement with ranges of 0-0.7%, 0-2.7% and 0.5-2.7%respectively.

STRUCTURE SUB-CATEGORIES: % OF STRUCTURE NCUS CORRELATIONS (R)
A The results of the comparison of the structure NCUs in the participants' writing and the samples of their speech are shown in Table 2.In every case a difference can be seen between the overall structure NCU%s although the direction of the difference is not consistent, with participant B showing a higher NCU% for writing than for speech, whereas participants A, D and E all show the opposite trend.However, the differences based on paired sample t tests were not significant at p<0.05 for participants A, B and E although that for participant D was significant at p<0.01.In addition, all the correlation coefficients between the numbers of NCUs for the structure sub-categories were moderate and positive and for participants A, B and E were significant at or above p<0.05.One way ANOVAs were performed on the two sets of structure NCUs (writing and speech) but no significant differences were detected suggesting a broadly similar level of speech and writing among the four participants (writing: F=2.52, speech: F=1.31; critical value of F=2.77, df=3, 56).Finally, the participants' rating of difficulty of the areas which had produced the greatest numbers of NCUs in their writing were compared with the actual order and the Spearman rank order correlation coefficients are shown in Table 3.As can be seen, none of the participants were very successful, with A, D and E all producing non-significant negative correlations between the predicted and actual orders and only B producing a small positive, though non-significant correlation.In summary therefore, although the numbers of NCU per participant varied (but not significantly), the patterns of NCU distribution were highly and significantly correlated.Further, the patterns of the participants' NCUs in their spoken and written English were moderately correlated, in three cases significantly, and only one participant produced a significant difference in spoken and written performance.
Overall, language structure accounted for approximately half the NCUs recorded with articles, nouns, verbs and prepositions producing the most structure NCUs and lexical and rhetorical issues accounting for most of the balance.Finally, the participants were unsuccessful in identifying the areas where they had produced the largest numbers of NCUs.

DISCUSSION
This, it must be emphasized, was a preliminary study aimed at establishing the feasibility of the method and identifying issues on which to focus in a broader study involving a larger sample drawn from the corpus of manuscripts.Therefore, at this stage it is only possible to identify possible patterns in the data particularly those indicative of similar trends in the participants' use of English.
The first area in which a trend can be observed is the distribution of NCUs in the four manuscripts analyzed, with broadly similar proportions of NCUs being attributable to structural errors and to non-structural causes related to writing style.Further, the distribution of structural NCUs was, as anticipated, heavily weighted towards areas such as article and noun use, preposition use, and verb inflection, all areas in which Thai and English differ, and the possibility of a 'mother tongue' effect influencing the learning of English by these academics Vol.4(2)(2016): 251-264 cannot be dismissed.This is also supported by the generally high correlations between the NCU% for the four participants, indicating that they all have broadly similar difficulties in controlling English grammatical usage which cannot be unrelated to the fact that all were brought up in Thailand speaking Thai as an L1 and all had similar educational backgrounds.
The distribution of errors also points to a hierarchical order of acquisition crudely agreeing with the ideas of Dulay and Burt (1974b), although the order suggested by the frequencies of errors made by the four participants in this study was closer to the order inferred by Johnson and Newport (1989) based on correlating numbers of errors with the age their participants, all immigrants of Asian origin, began learning English.They found correlations above 0.6 for (in descending order), past tense, plurals, pronouns and determiners.The order was much less closely aligned to Dulay and Burt's (1974b) order based on Spanish-speaking immigrant children in the US.Clearly, based on this small sample, no conclusion can be reached, but the findings suggest that if there is an order of acquisition effect, it may be idiosyncratic for Thai learners, again pointing to L1 influence.
Finally, the generally high correlations found between the structure NCU%s in the participants' speech and writing suggest that there are common factors affecting both their ability to communicate verbally and their ability to use English as a written medium for academic purposes.However, such a small sample as this, confined as it is to academics, may not be representative of the broader population in Thailand where few people use English to any significant extent, and the effect may be of more significance in academics, who are a group within Thailand who do need to use English on a regular basis.

CONCLUSION
This preliminary study has produced data suggesting that the English proficiency in both the speech and writing of these advanced Thai users of English is affected by similar underlying problems.The findings also strongly suggest that the participants' learning of English was influenced by Thai, their L1 but there was also limited support for the order of acquisition hypothesis, though perhaps one idiosyncratic to Thai learners.
Therefore, an extended study is clearly warranted using the same basic methodology to collect a broader sample of data from a wider pool of participants.
The information from such a study would contribute to a better understanding of how academics in Thailand acquire basic language skills and then use them to develop their ability to use English in academic discourses and particularly to conduct research and publish articles in English language journals.It would also add to the knowledge of how academic language is acquired in environments where the language being learned is not widely spoken or used and how academics

Table 1 .
NCU% by participant and correlation coefficients

Table 3 .
Comparison of participants' ordering of areas producing the greatest number of NCUs