24 lines
1.9 KiB
TeX
24 lines
1.9 KiB
TeX
\par
|
|
Authorship attribution is an important problem in natural language processing. It has applications in various fields, such as in archeology, journalism, or even law (knowing the source of a ransom note might help save lives).
|
|
|
|
\par
|
|
There are several problems that authorship detection involves:
|
|
|
|
\begin{enumerate}
|
|
\item Identifying the author of a text: in this case, we have a database of authors and some of their works. In this case, given an unknown text, we are trying to attribute it to a known author.
|
|
\item Profiling the author of a text: in this case, the author is unknown. We want to build a profile of the author based on the text. This means finding things such as the ones enumerated below. Of course, some of these attributes might be impossible to find just from a single text, but we might be able to give a rough estimate.
|
|
\begin{enumerate}
|
|
\item the time when the author lived
|
|
\item an estimate of the age of the author when he wrote the text
|
|
\item the location where he was born or where he lived, whether he came from a rural or urban background
|
|
\item the gender of the author
|
|
\item what kind of studies the author had
|
|
\item what occupations the author had
|
|
\end{enumerate}
|
|
\item Verifying authorship of a text: we want to verify if a text is correctly attributed to a specific author (for example, the authorship of some of the books in the \emph{Bible} is contested. Traditionally, the \emph{Letter to the Hebrews} is attributed to \emph{Paul}, but most researchers reject this view). Also, we want to detect instances of plagiarism.
|
|
\end{enumerate}
|
|
|
|
\par
|
|
In the following sections we will look into each of these problems in greater detail. I will present a survey of existing research on this topic in section ...\todo{Add chapter number here}. In section ...\todo{Same here} I will present my own research and methodology.
|
|
|
|
\todo[inline]{This introduction could be improved.} |