1. Overview

	This folder contains the replication materials for 
	“Reading between the lines : Uncovering asymmetry in the central bank loss function”.

	The analysis combines R-based text methods, topic modeling, and Python-based transformer language models.

2. Computational Environment

	R:
	R version 4.1.0 (2021-05-18) – "Camp Pontanezen"
	Platform: x86_64-w64-mingw32/x64 (64-bit)

	Required R packages and their versions are documented in:

	R libraries/main_library_package_versions.xlsx
	R libraries/topic_model_library_package_versions.xlsx

	Topic modeling uses a partially different package set.

	To ensure reproducibility, please run the R program from a clean R session by selecting "Session" -> "Restart R and Run All Chunks".
	Alternatively, restart R (Ctrl + Shift + F10) and then run all chunks (Ctrl + Alt + R).


	Python (language models):
	Python 3.6
	PyTorch 1.10.2 (CPU build)
	transformers 4.11.3
	CUDA not available (CPU only)

	All Python scripts were executed on an Azure Virtual Machine with the following specifications:
	Azure Virtual Machine
	Intel Xeon CPU E5-2673 v4
	14 GB RAM

3. Folder Structure

	estimation data/ – regression datasets
	language models/ – input/output data and Python scripts
	latex-tables and figures/ – manuscript outputs
	modified LM dictionary/ – adjusted L&M dictionary
	rules/ – bigram and trigram rules
	tone/ – tone indices

	Main R scripts (run in order):

	1. Lexicon-based whole text tone.Rmd
	2. Figure 1 and 3.Rmd
	3. Topic model and Figure 2.Rmd
	4. Lexicon-based inflation text tone.Rmd
	5. Text data to language models.Rmd
	6. Language model tone indices.Rmd
	7. Time dimensions GPT-4.0.Rmd
	8. Figures 4-7 and selected Appendix figures.R
	9. Loss function estimations.Rmd

4. Execution Order

	Step 1: Run R scripts 1–5.
	Step 2: Execute the Python scripts in the language models/ directory using the computational environment specified above
	Step 3: Run R scripts 6–9.


5. Reproducibility Note

	For full computational replicability, users should employ the same:

	-R version (4.1.0)
	-Python version (3.6)
	-Package and library versions
	-Compute environment (CPU-based execution, no CUDA)

	Differences in software versions, hardware architecture, or linear algebra backends may result in small numerical variation.




6. Directory Map


/ (codes)

│
├── 1. Lexicon-based whole text tone.Rmd
├── 2. Figure 1 and 3.Rmd
├── 3. Lexicon-based inflation text tone.Rmd
├── 4. Topic model and Figure 2.Rmd
├── 5. Text data to language models.Rmd
├── 6. Language model tone indices.Rmd
├── 7. Time dimensions GPT-4.0.Rmd
├── 8. Figures 4-7 and selected Appendix figures.R
├── 9. Loss function estimations.Rmd
│
├── README.txt
├── intro.csv – ECB Introductory Statements
├── default loughran.xlsx – Default Loughran & McDonald (2011) dictionary
├── figure_dataset.xlsx – Combined dataset for tone indices
├── inflation texts.xlsx – Inflation texts selected by LDA topic model
├── inflation data.xlsx – Inflation timeseries
│
├── estimation data/
│   ├── Control variables.xlsx
│   ├── Meeting based data.xlsx
│   └── Quarterly data.xlsx
│
├── language models/
│   ├── Central Bank RoBERTa language model.ipynb
│   ├── FinBERT_language_model.ipynb
│   ├── GPT-4.0 sentence classification.py
│   ├── (Excel output files)
│   └── README.txt
│
├── latex-tables and figures/
│   ├── Figure PDFs
│   └── Table files
│
├── modified LM dictionary/
│   └── (Adjusted Loughran–McDonald dictionary files)
│
├── R library packages and versio info/
│   ├── main_library_package_versions.xlsx
│   └── topic_model_library_package_versions.xlsx
│
├── rules/
│   ├── bigram-rules.xlsx
│   └── trigram-rules.xlsx
│
└── tone/
    ├── Lexicon-based tone outputs
    ├── FinBERT outputs
    └── RoBERTa outputs

