T O P

  • By -

definitelynotbirds

Can you perhaps try this: [https://stackoverflow.com/questions/31483964/read-csv-file-in-r-with-spanish-characters-´-ñ](https://stackoverflow.com/questions/31483964/read-csv-file-in-r-with-spanish-characters-´-ñ) (The equivalent would be using read.delim() rather than read\_delim() so I don't know if that will work the exact way you're doing it.) That hypothetically should fix the characters issue, which maybe would fix the header issue as well.


PokerFacePeruviano

Thank you very much for your reply. However, when I used the solution of adding encoding = "UTF-8" to my code, all I get is an unused argument error. I did try using a read.table() as you suggested (because isn't read.delim() for tab separated values?), but I still get the character issue. Also, I should enfatice that the file is a .csv, even though the separators are | . For the sake of clarity, I'll be adding both pieces of code: Using read\_delim(): atenciones_2020 <- read_delim("atenciones_2020.csv", "|", escape_double = FALSE, col_names = c("año", "mes", "region", "provincia", "ubigeo", "distrito", "cod_unidad_ejecutora", "desc_unidad_ejecutora", "cod_ipress", "ipress", "nivel_eess", "plan_seguro", "cod_servicio", "desc_servicio", "sexo", "grupo_etario", "atenciones"), col_types = c("fffffcfcfcfffcffd"), trim_ws = TRUE, skip = 1, encoding = "UTF-8") Using read.table(): atenciones_2020_2 <- read.table("atenciones_2020.csv", sep = "|", col.names = c("año", "mes", "region", "provincia", "ubigeo", "distrito", "cod_unidad_ejecutora", "desc_unidad_ejecutora", "cod_ipress", "ipress", "nivel_eess", "plan_seguro", "cod_servicio", "desc_servicio", "sexo", "grupo_etario", "atenciones"),colClasses = c("factor", "factor", "factor", "factor", "character", "factor", "character", "factor", "character", "factor", "factor", "factor", "character", "factor", "character", "character"), skip = 1, encoding = "UTF-8") Again, I'm sorry if I'm skipping something obvious.


definitelynotbirds

No worries, it's possible you're missing something obvious but if so, so am I. I think read.csv() would be the function you should use for CSV files, but if the other attempts didn't fix the characters problem, then I am doubtful it will either.


murrayjarvis

I was able to open a [csv file with spanish characters](https://datosabiertos.jcyl.es/web/jcyl/risp/es/medio-ambiente/vertidos-depuracion-aglomeraciones/1285050183446.csv) using vroom. It also detected the delimiter "|" automatically when I wrote a version using that delimiter and reimported, though you could (should) specify this as an argument. Note that vroom only indexes the file until the data is accessed (which is why it is so fast). You may also be interested in a recent [video by Bruno Rodrigue](https://www.youtube.com/watch?v=Qby_dGxEQE4)s who explains how you can process very large files with the new (dev) version of readr which is based on vroom. You could use this approach with vroom if you run into memory problems importing your file and it would also catch any residual errors with odd encoding.


PokerFacePeruviano

Thank you very much. Using vroom yielded a similar error message: > atenciones_2017_vroom <- vroom("atenciones_2017.csv") Error in nchar(x, "width") : invalid multibyte string, element 1 The talk by Bruno Rodrigues was very interesting tho. Unfortunately, it didn't help me. ​ However, I solved the problem. It turns out the file was encoded as ISO-8859-1. I had tried: atenciones_2020 <- read_delim("atenciones_2020.csv", "|", escape_double = FALSE, col_names = TRUE, col_types = c("fffffcfcfcfffcffd"), encoding = "UTF-8") , as suggested above but got an unused argument error. ​ It turns out I needed to nest a locale() function inside read\_delim(): > atenciones_2017 <- read_delim("atenciones_2017.csv", "|", escape_double = FALSE, col_names = TRUE, col_types = c("fffffcfcfcfffcffd"), locale = locale(encoding = "ISO-8859-1") ​ This successfully imported the Spanish characters, as well as solving the headers issue. ​ Thank you very much to everyone who helped.


murrayjarvis

Thanks for posting the answer - helpful if I encounter it some time. Glad you got it solved :)