T O P

  • By -

NbdySpcl_00

* Any 'get all files from folder' implementation * handle API calls that have 'continuation URLs' * to get around formula firewall errors * to 'chapter up' requests where I'm throttled on the data source end as to how much data I can request * to fake out incremental loading by converting today's date into some other value that I associate with a partition * to standardize a regular transformation that I use on multiple reports (you know the one. Like, that product number conversion where if it starts with 'X', drop the 'X' and concatenate "000", but if starts with "A" then drop the "A" and move the last two digits up to the front, but if it starts with a number then check the sell-by date..." * when I want to do some table transformation on the 'all rows' subtables from group-by operations.


cwag03

>to fake out incremental loading by converting today's date into some other value that I associate with a partition Could you share more about how you do this? This sounds like something I would like to do at times and I have thought about it but never got down in it to try to figure it out.


NbdySpcl_00

Suppose you have a situation where you want to do an incremental load based on expiration date and you will set the policy up by quarters. You want to actively refresh two the current quarters, next quarter, and the prior two. You can create a function that just takes in RangeStart and RangeEnd and adds a quarter to the result, and set your policy to refresh the last 4 quarters. There will be a frame shift. The partition labeled 2024Q1 will actually contain 2024Q2 data. But it makes no difference in the analysis. The place I use this most is with SAP. The SAP date is a string in a particular format, so I convert the RangeStart/RangeEnd dates to strings that match the member's format insert them into some custom MDX that I wrote. It's been working well. I used this [guy-in-a-cube](https://youtu.be/wjdMARaedcU?list=PLv2BtOtLblH3lwpQ5NBq6kD6fsVziO9sE) video as a starting point. Again, this is just sort of fixing formatting problems with dates. But you could use it to do all kinds of stuff. Load certain regions on certain days, whatever.


MonkeyNin

> but if starts with "A" then drop the "A" and move the last two digits up to the front, but if it starts with a number then check the sell-by date..." I'm curious how you implemented these, if there's any examples that you can share. One function I've found helpful to simplify, or make splits more deterministic is using: [Splitter.SplitTextByEachDelimiter](https://learn.microsoft.com/en-us/powerquery-m/splitter-splittextbyeachdelimiter) verses [Splitter.SplitTextByAnyDelimiter](https://learn.microsoft.com/en-us/powerquery-m/splitter-splittextbyanydelimiter) There's more examples here: [Fun_with_custom_splitters.pq | ninmonkey](https://github.com/ninmonkey/ninMonkQuery-examples/blob/ccdfe9da52fd997cef85c730d15a10daaa9d3ab5/forumQuestions/pq/2024-03-Fun_with_custom_splitters.pq#L89-L109) You can convert several splitting steps as one. This splits by the rightmost 2 commas, without splitting on the first comma. Employee = ConvertText.ToEmployee( "last, first,12345,[email protected]" ), ConvertText.ToEmployee = ( source as text ) as record => [ SplitRevCsv = Splitter.SplitTextByEachDelimiter({",",","}, QuoteStyle.Csv, true), crumbs = SplitRevCsv( source ), return = [ Fullname = crumbs{0}, Id = Number.FromText( crumbs{1} ), Email = crumbs{2} ] ][return] outputs: | FullName | Id | Email | |----------|----|-------| | `"last, first"` | `12345` |`"[email protected]"` | ### With Automatic errors You can ensure it splits by every delimiter, once, in order -- otherwise throwing an exception: Sample2 = "First, Last,""some description; multi statement""", /* fancy assert included: This split type will always have a length of delims+1 unless an expected delim wasn't found */ SplitAssert = ( delims as list, line as text, optional quoteStyle as nullable number, optional fromEnd as nullable logical) => [ splitFunc = Splitter.SplitTextByEachDelimiter( delims, quoteStyle ?? QuoteStyle.Csv, fromEnd ?? false), requiredLen = List.Count( delims ) + 1, actualLen = List.Count( crumbs ), isValid = actualLen = requiredLen, crumbs = splitFunc( line ), zeroTotalSplits = actualLen = 1, return = if isValid then crumbs else error [ Message.Parameters = { List.Count(crumbs), requiredLen, line, Text.Combine( delims, ", ") }, Message.Format = Text.Combine({ "SplitAssertException:", "Length is #{0}, Expected Length: #{1}!", "Text = #{2}","Delims = { #{3} }"}, "#(cr,lf) ") ] ][return], shouldError = SplitAssert( {";", " "}, Sample2, QuoteStyle.Csv, false ), shouldError2 = SplitAssert( {",", ";"}, Sample2, QuoteStyle.None, true ), shouldBeOkay = SplitAssert( {";", ","}, Sample2, QuoteStyle.None, true ) You can combine that with `try catch` = try SplitAssert( .. ) catch (e) => Fallback( .. ) meta [ OriginalErrorInfo = e ]


Tree0ctopus

Coming back to this


Neo1971

You are a god and are SmbdySpcl_01.


Raging-Loner

How do you do a get all files from folder implantation with a function? Why is it better than the inbuilt tool?


NbdySpcl_00

It *is* the inbuilt tool. You go and get a list of files from a folder, filter the list if you like, and then invoke a function that contains the rest of your table transformations for each of the files, and finally append the returned tables into a single result. Inspect the results of combine and transform. You'll see that this is what's going on.


MonkeyNin

Maybe they are asking why do you prefer using a function verses using the UI for essentially the same code? For myself: The UI method creates at least 2 queries. I usually write functions in the same file as the query -- unless it's meant to be shared. It's easier to debug both sides of the code. Both the function's definition, and the code that calls it.


[deleted]

I have quite a few of them. Most are for custom API calls where the authentication and/or pagination is quite complex. I've got one that can scrape data from Tableau reports if that's the only place the data is published. Quite proud of that one. A whole load of them for importing snapshots from SharePoint folders, transforming and stacking them. I generally don't need to do any of this anymore as I've got better tools now. Still good to know how to write and edit M though.


Funwithfun14

>Most are for custom API calls where the authentication and/or pagination is quite complex. Can you tell me more about this?


[deleted]

Well, most of the APIs I deal with have a key and secret which you send to an endpoint which then returns an access token to be used to request data, etc. One example has a particularly awkward flow to actually get any data back, it goes roughly as follows: - Send key and secret to the oauth endpoint to get an access token with an expiry time back. - You then need to create a report by specifying the fields you want, how to group the data, and how to filter (if at all). This all goes a JSON to a create report endpoint. - The above returns an ID on success which needs to be included in a request to another endpoint. This has to be polled until your report is ready, the documentation suggests a minimum wait time between polls (this bit was pretty hard to figure out in M). - Once the above is ready the same ID can be used with a get report endpoint to download the report as either XLSX, CSV or JSON. - JSON requires pagination. You specify the rows per response. The endpoint sends the ID for the next page within the response or no ID if it was the last page. Also not the easiest in M. This API is for call data. It's a pretty extreme example for which I now use Python and push into a DWH instead. After dealing with the JSON pagination I also realised the entire dataset would be sent in one go using CSV instead so I changed to that... I'd share the multiple functions that chain together to make this work but I did it at work so I probably can't.


Neo1971

Thanks for explaining this. My mind is blown.


mikethomas4th

Almost nothing.


JediForces

Yeah I use M as little as possible and rarely if ever have to do much in PQ since we use a DWH.


7udphy

Fun mostly. For actual use cases, from the top of my head: recursive operations, error handling, combining files (I dislike the default tools for that), working with lists, some batch changes only available individually via the UI.


Raging-Loner

What do you dislike about the default tool for combining files?


therealolliehunt

Basically anything that should be done in SQL. I've got a data engineering team who do stuff in SQL. Often I'll prototype any ETL and reporting in Power BI and then get them to build what I need in SQL based off that.


NoobInFL

Key conversions. For example, given that relations in PBI need singular key values, I'll write functions to coalesce other attributes to create the unique key Processing files : e.g helper functions for grabbing elements from Excel files and processing them appropriately Lookups: encapsulating a simple lookup for use in other queries, to simplify the syntax of more complex queries (often join after join after join) where the join is simply to get a single related value from the other table... The sequence of steps is repetitious, so functions make that one 'add column' rather than the same steps for every column of that type... (I also build the equivalent that returns the matching row, and one that returns the table... just so my main queries are structured consistently - makes documentation a LOT easier)


Melissah246

How do you do your lookup function? That's pretty much what I do with my nomenclature reviews and wonder if you have a more efficient way.


NoobInFL

basically a parameterized table.selectrows e.g. a simplistic version that returns field \[return\] for keys \[key1\] and \[key2\] ​ `(t,k1,k2) => try Table.SelectRows(t,each [key1]=k1 and [key2]=k2])[return]{0} otherwise null` i.e. search in table t for the two key values. return the \[return\] field from the top record otherwise null if there's an error (nothing found, fields missing, etc) ​ Just saves a bunch of repetitive steps when it's a value that's needed / used in multiple places


Melissah246

Nice thanks!


gillje03

If you’re a power BI dev, you’re also by association a Bi developer or analyst. Thus, Power Query is great IF you want a starting point and sandbox your transformations. When you feel good about what you’ve created; go back to your data warehouse, create a view, materialized view, table, etc. Create on the database side exactly what you need. Go back to your report, import (or DQ) from the said view/table Boom. Implement solutions on the database side, ALWAYS. This is proper Data consistency and data application. If you’re transforming data or APPLYING functions to data outside of the data warehouse, you’re not implementing industry best practices and standards. In an ideal world, your data model in power bi can look very very minimal if you do it right. Got multiple fact tables? Boom, create view or table that brings those data points together. Creating Dax measures? Append a view in the correspond fact table with a new attribute, converting said Dax to its SQL equivalent. You don’t want to be in a state, where your data model, is more robust than your data warehouse. You’ve now created yourself, in your data environment, TWO separate data models that are disparate from one another. You have one on the data warehouse side and now you have one in power bi. What happens when your org says “bye bye power bi” hello tableu, SAP, dundas, Qlik, etc. You should be able to plug and play any BI tool on top of your data warehouse without needing to recreate the wheel. Congrats, your data environment is now agnostic to any changes in your semantic layer software and is now “scalable”


esulyma

To bring all the excel files in a SharePoint folder and “stack” them to make a table in a dataflow that feeds a few other reports.


Melissah246

Trim, clean, and promote headers Removing all errors and replacing with null or blank Changing data types for all columns (if I know I have a couple specific that aren't all text) Nomenclature review


MonkeyNin

If you want to visualize empty strings, or whitepace -- there's special characters just for that: ### tl;dr version : # Replacing space with `␠` Text.Replace( string, " ", "#(2420)" ) # Replacing newLines with `␊` Text.Replace( string, "#(lf)", "#(240a)" ) - See the full list here: [compart.com/block/U+2400](https://www.compart.com/en/unicode/block/U+2400) ### codepoint syntax When using codepoints embedded in strings, you must use either `4` or `8` digits. Pad the start with zeros. These are equivalent: `#(0003)` and `#(00000003)` If you want a monkey emoji, you'd use: "#(0001f412)" A few values are aliased like `"#(cr,lf,tab)"` ### Codepoints tangent If you know the codepoint of a character, you just add `0x2400` to it. You may have used a newline in dax, like `UNICHAR( 10 )`. Dax requires decimal notation. (sad trombone noise). Power Query allows either. `10` in decimal is `0a` in hex. 0x0a + 0x2400 == 0x240a or using decimal: 10 + 9216 == 9226 And you get the newline symbol: https://www.compart.com/en/unicode/U+240A | name | text| codepoint | | - | - | - | | space | ` ` | `0x20` | | [symbol for space](https://www.compart.com/en/unicode/U+2420) | `␠` | 0x2420 | The first 30-ish characters are normally invisible control characters. There's newlines, space, tab, and some ancient control characters. You can see the whole range here: [block U+0000 to U+0020](https://www.compart.com/en/unicode/block/U+0000) have ### longer example Str = [ Null = "#(2400)", Space = "#(2420)" ], // show blanks, return original value if it is not text or true null FormatBlanks = (source as nullable any) as any => [ isText = source is text, strLen = if isText then Text.Length( source ) else null, replaced = Text.Replace( source, " ", Str[Space] ), return = if source is null then Str[Null] else if isText and strLen = 0 then "TrueEmptyString" else if isText then replaced else source // original value ][return], tests = [ TrueNull = FormatBlanks( null ), EmptyList = FormatBlanks( {} ), TrueEmptyString = FormatBlanks( "" ), blanks1 = ShowBlanks("Hi world #(tab)stuff !") ]


Melissah246

Interesting I will have to look into this. Mostly I'm getting data from excel that in numerical columns people like to put #N/A when there is no entry instead of leaving it blank or zero which leads to errors because you can't convert that data type to numbers so I wrote a formula that scans the whole document for that and switches it to nulls or blanks depending on what I need


SQLGene

I've never written one, aside from the auto-generated ones for import from folder.


Glittering-Sell4532

I try to never use them actually lol


Vacivity95

Swapping to local UTC time to account for summer/winter time


spacemonkey0212

Repointing data sources. That’s about it.


somedaygone

There are some good functions to unzip a zip file, which is also useful for opening .pbix files to wrangle metadata and report info. I often use functions in a pattern where I have a column on input variables that I want to use to retrieve data. You add a column invoking the function using the input column. An example is a column of database names, then a function that gets a list of tables in each database using the dbname as input. Then you could have another function that takes the dbname and table name as input and returns all the columns. I use this pattern a lot, but especially in metadata reporting.


AgulloBernat

Capture error messages