We are the software company that builds world class products using our own software services – because we know best! Our software is efficient, reliable and always up to date with the latest trends, so you can rest assured that your product will always be on the cutting edge. Let us help you bring your vision to life! Here are a few products from our own stables!
Assuming you found answers to questions in my earlier blog and now implementing the Tableau node cluster as your enterprise reporting solution, I am now trying to expose few more aspects of Tableau.
I am writing this blog to share experiences, in brief, we learned hard way, this one is more specific to Tableau but some of them are generic.
New Tableau developers or architects tend to do common mistakes which may prove costly over the time. If you are troubled by your reporting framework despite of using Tableau, then there is something wrong going on with its implementation.
EquationsWork have very rich experience using Tableau implementations for large enterprises, for more information please contact us here.
You must be aware of number of users or number of licenses that you are going to need, not only short term but also long term. A very common mistake is using Tableau with limited or no understanding of this knowledge. Tableau typically sell named and core license. If you are building system where you intend to publish the reports to end users on your application then with named license model, for every user to be able to view the Tableau report, you must purchase named license. Named licenses are flexible and you can use it effectively with smaller user base. For large user base core license makes sense, but in this case, you will get limited processing power and it is expensive.
Handling large data
Processing large amount of data is challenge to every tool. If you are connecting Tableau to your large Time Series table or a collection, then this is wrong approach! You should always compute summary and use it in Tableau Desktop for creating reports. Using Live vs. Extract mode is key decision to make for the performance.
Limit data points
Knowing amount of data, you need to visualize is very critical investigation before you choose reporting tool. Key here is to use as much summarize data to render on the reports. One should avoid visualizing thousands of data points on the report, when you apply formulas on the report or add multiple measures, report end up doing increased processing (like the Cartesian join in SQL).
Connecting to NoSQL
Tableau can be used to connect almost all NoSQL databases, here you need an ODBC based driver to be able to read from these NoSQL databases. Simba is one of market leader for third party bi-directional ODBC connectors. While using such drivers, you need to ensure that the data you are going to process through them should be limited to achieve the better performance. You can make them connect to summarized collections.
Once you understand the data requirement, you need carefully choose the deployment strategy. Whether you need one fat server or 3 node cluster or 5 node cluster is very important decision because it directly impacts your budget too. One need good understanding of core Tableau components, because deploying them and separating them depends on nature of your application e.g. cluster distribution of data server, web app, repository is key decision.
Secure access is inevitable requirement of any enterprise and Tableau offers very good feature of authenticating the requests with their Trusted Authentication feature. Key here is to use it such that you won’t end up configuring again and again with changing network policies of your company. Understanding what IP to put and configuring it on environment below UTM or different subnets may be bit tricky. You should always write the warm up script to identify and configure relevant IP addresses to Tableau Server, instead of manual configuration.
While working with one of large Investment Bank, I realize that you just can’t take database access for granted, even if you are accessing using Tableau like tool with in their secure eco-system. Tableau have solution to such scenario, Tableau API. You can write programs to fetch data using API with languages like Python or Java and generate a TDE (Tableau Data Extract) that you can host on Tableau Server. Downside of this is performance impact while using large data and need of API as pre-requisite.
Monitor Tableau Server
For critical application, it’s very important to monitor all Tableau components running on cluster along with Tableau Server admin console which in turn include all reports. Tools like Nagios can be configured to monitor resources. Having DR site is advisable for mission critical applications.
Setting up Tableau cluster may be tedious if you are doing it for the first time, hence documenting the steps is advisable but much better is to script it so you can reuse it. If you are using cloud then you can save the image of initial setup, e.g. save AMI on AWS. Having dev, stage and prod environment is advisable to ensure that you are aligned with your devops process. Three installations are allowed with single Tableau license (this may change over the time so please verify from Tableau support while you implement)
Performance, Security & Scalability are essential for every business and sometime is the reason for Tableau adoption. Tableau supports all three key non-functional aspects as out of the box feature.