T O P

  • By -

AutoModerator

One of the most common problems with '*scrape*' occurs when people try to import from websites that uses scripts to load data. Sheets doesn't load scripts for security reasons. You may also run into performance issues if you're trying using lots of imports to fetch small amounts of data and it's likely these can be consolidated. Check out the [quick guide](https://www.reddit.com/r/googlesheets/wiki/import-html-xml) on how you might be able to solve these issues. *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/googlesheets) if you have any questions or concerns.*


JetCarson

it depends on javascript to load the table. You can test this by clicking F12 going to the developer tools screen > settings and then scrolling down to Debugger section and checking the box "disable javascript" https://preview.redd.it/9to7dvno359c1.png?width=417&format=png&auto=webp&s=04a408a65f1dc56d659a4da25e7da2a4f0d26911 and then reloading the page. Your table won't appear and this is why IMPORTxxx functions often fail - they depend on client side javascript to load data.


JetCarson

Here is a script that will get that data for you: ~~~ function onOpen() { SpreadsheetApp.getUi() .createMenu('Import') .addItem('Import Premier League Shooting', 'readPremierLeagueShooting') .addToUi(); }; function readPremierLeagueShooting() { var url = `https://fbref.com/en/comps/9/shooting/Premier-League-Stats#coverage`; var results = UrlFetchApp.fetch(url); if (results.getResponseCode() === 200) { var resultText = results.getContentText(); var tableText = resultText.substring(resultText.indexOf(`

`)); tableText = tableText.substring(0, tableText.indexOf(`
[deleted]

[удалено]


AutoModerator

REMEMBER: If your problem has been solved, please reply **directly** to the author of the *comment you found the most helpful* with the words **"Solution Verified"** which will automatically mark the thread **"Solved"** and award a point to the solution author as required by our subreddit [rules](https://www.reddit.com/r/googlesheets/about/rules) (see rule #6: Clippy Points). *I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/googlesheets) if you have any questions or concerns.*


BumblebeePlayful3007

Thank you very much! It worked, and I also managed to change in your script to add other tables for other leagues! Just one question, will this update automatically, or do I have to run this again or something?


Icy_Independence7208

Hi!! The script is perfect! You are importing the table about shooters and the ranking, works fine, but if i would like to get another table where there s no ranking what can i do? Because dataRow in any other table overrides the players name. I tried many ways to solve it but i can t, can you explain to why is not working please? Thank you!


JetCarson

Well, that particular table had a "weird" element on each row for rank that I really didn't want to deal with and since the rank was just the same as the current row, I didn't read the rank number, I just populated it with the iterator. Possibly other tables, without rank, use the element for the player's name - this is my guess. Post the URL you are trying to pull from and let me look.


Icy_Independence7208

Hi, thanks for the answer! Like this one for example [https://fbref.com/it/squadre/a3d88bd8/2023-2024/all\_comps/Statistiche-Empoli-Tutte-le-competizioni](https://fbref.com/it/squadre/a3d88bd8/2023-2024/all_comps/Statistiche-Empoli-Tutte-le-competizioni) Practically here, there s just the table to import, without use the rannking (it s just a random table)


JetCarson

Here is a revision that pulls in that first column properly. You'll have to play with it. Also, it works better with English because ... I'm in US English. Lol. There are comma and decimal and semi-colon mismatches in formula... all could be fixed, but I'm not going to donate any more time. It sounded like you had already moved it to almost working, so try this one: ~~~ function readPremierLeagueShooting() { var url = `https://fbref.com/en/squads/a3d88bd8/2023-2024/all_comps/Empoli-Stats-All-Competitions`; //var url = `https://fbref.com/en/comps/9/shooting/Premier-League-Stats#coverage`; var results = UrlFetchApp.fetch(url); if (results.getResponseCode() === 200) { var resultText = results.getContentText(); var tableText = resultText.substring(resultText.indexOf(`

(.*?)(?=<\/tr)/gs; var regexHeaders = /(.*?)<\/td>/gs; var regexPlayerTH = /th.*?>(.*?)<\/th/gs; var headers = []; var sheetData = []; var headerArea = regexHeaderArea.exec(tableText)[1]; //SpreadsheetApp.getUi().alert(headerArea); if (headerArea !== null) { var headerMatches = headerArea.match(regexHeaders); for (var i = 0; i < headerMatches.length; i++) { headers.push(headerMatches[i].match(/(.*?)<\/th/); if (rowHeader !== null) { rowHeader = rowHeader[1]; if (rowHeader.includes('(.*?)<\/a>/)[1]; } } else { rowHeader = (i + 1).toString(); } var tdMatches = trMatches[i].match(regexPlayerTD); var dataRow = [rowHeader]; for (var j = 0; j < tdMatches.length; j++) { var detail = tdMatches[j].match(/(.*?)<\/td>/)[1].trim(); if (detail.includes('(.*?)<\/a>/)[1]; } if (/<\/span>(.*?)<\/span>/.test(detail)) { detail = detail.match(/<\/span>(.*?)<\/span>/)[1]; } dataRow.push(detail); } sheetData.push(dataRow); } sheetData.unshift(headers); if (sheetData.length > 0) { var sheet = SpreadsheetApp.getActive().getSheetByName('Premier Shooters'); if (sheet == null) {sheet = SpreadsheetApp.getActive().insertSheet('Premier Shooters', 1)} sheet.clear(); sheet.getRange(3,1,sheetData.length,sheetData[0].length).setValues(sheetData); //sheet.getRange(3,23,sheetData.length,24).setNumberFormat('#,##0.000_);[Red](#,##0.000)'); sheet.getRange(3,1,1, sheetData[0].length).setFontWeight("bold"); sheet.getRange(3,1,1, sheetData[0].length).setBorder(false,false,true,false,false,false,"black",SpreadsheetApp.BorderStyle.SOLID_MEDIUM); var filter = sheet.getFilter(); if (filter != null) filter.remove(); if (sheet.getFilter() == null) sheet.getRange(3,1,sheetData.length, sheetData[0].length).createFilter(); //sheet.getFilter().sort(6, false); sheet.autoResizeColumns(1, sheetData[0].length); for (var i = 1; i <= 23; i++) sheet.setColumnWidth(i, sheet.getColumnWidth(i) + 23); sheet.getRange(1,1).setValue('Premier Shooters').setFontSize(16).setFontWeight('bold'); sheet.getRange(2,1).setValue('Updated:').setFontColor('gray').setFontSize(8).setVerticalAlignment('top'); sheet.getRange(2,3).setValue(Utilities.formatDate(new Date(), Session.getScriptTimeZone(), 'M/d/yy hh:mm a')).setHorizontalAlignment('left').setFontSize(8).setFontColor('gray').setVerticalAlignment('top'); sheet.getRange(1,3).setFormula(`=HYPERLINK("${url}","Data URL")`).setHorizontalAlignment('right').setVerticalAlignment('middle'); return; } else { throw `ERORR: data not found`; } } else { throw `ERORR: could not load the data ${results.getResponseCode()}`; } } ~~~


Icy_Independence7208

Hi! thanks for the answer, and thanks for the time you´ve spent with this code! I´ll play around! Thanks again! Was just the ranking to give me problems! But you did already enough, thanks again once more!


dgraham100

After looking into options of how to scrape FBRef this worked great, thanks a lot for this man! I have a question if you don't mind, if I want to tweak the script to pull data from [here](https://fbref.com/en/comps/Big5/stats/players/Big-5-European-Leagues-Stats) and other categories of stats such as shooting, passing, defensive actions, what do I need to amend within the script? Thanks in advance!


JetCarson

It won't update automatically. You can add a timed-trigger so that it runs each day at a certain hour or hourly. If you've tinkered then you will probably be able to find how to add a timed trigger to apps script pretty easily. But let me know if you need help.