Reference data

Reference data

Reference data is data used to classify or categorize other data. Typically, they are static or slowly changing over time. Examples of reference data include: Units of measurement Country codes Corporate codes Fixed conversion rates e.g., weight, temperature, and length Calendar structure and constraints Reference data sets are sometimes alternatively referred to as a "controlled vocabulary" or "lookup" data. Reference data differs from master data. While both provide context for business transactions, reference data is concerned with classification and categorisation, while master data is concerned with business entities. A further difference between reference data and master data is that a change to the reference data values may require an associated change in business process to support the change, while a change in master data will always be managed as part of existing business processes. For example, adding a new customer or sales product is part of the standard business process. However, adding a new product classification (e.g. "restricted sales item") or a new customer type (e.g. "gold level customer") will result in a modification to the business processes to manage those items. == Externally-defined reference data == For most organisations, most or all reference data is defined and managed within that organisation. Some reference data, however, may be externally defined and managed, for example by standards organizations. An example of externally defined reference data is the set of country codes as defined in ISO 3166-1. == Reference data management == Curating and managing reference data is key to ensuring its quality and thus fitness for purpose. All aspects of an organisation, operational and analytical, are greatly dependent on the quality of an organization's reference data. Without consistency across business process or applications, for example, similar things may be described in quite different ways. Reference data gain in value when they are widely re-used and widely referenced. Examples of good practice in reference data management include: Formalize the reference data management Use external reference data as much as possible Govern the reference data specific to your enterprise Manage reference data at enterprise level Version control your reference data

Fairness (machine learning)

Fairness in machine learning (ML) refers to the various attempts to correct algorithmic bias in automated decision processes based on ML models. Decisions made by such models after a learning process may be considered unfair if they were based on variables considered sensitive (e.g., gender, ethnicity, sexual orientation, or disability). As is the case with many ethical concepts, definitions of fairness and bias can be controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. Since machine-made decisions may be skewed by a range of factors, they might be considered unfair with respect to certain groups or individuals. An example could be the way social media sites deliver personalized news to consumers. == Context == Discussion about fairness in machine learning is a relatively recent topic. Since 2016 there has been a sharp increase in research into the topic. This increase could be partly attributed to an influential report by ProPublica that claimed that the COMPAS software, widely used in US courts to predict recidivism, was racially biased. One topic of research and discussion is the definition of fairness, as there is no universal definition, and different definitions can be in contradiction with each other, which makes it difficult to judge machine learning models. Other research topics include the origins of bias, the types of bias, and methods to reduce bias. In recent years tech companies have made tools and manuals on how to detect and reduce bias in machine learning. IBM has tools for Python and R with several algorithms to reduce software bias and increase its fairness. Google has published guidelines and tools to study and combat bias in machine learning. Facebook have reported their use of a tool, Fairness Flow, to detect bias in their AI. However, critics have argued that the company's efforts are insufficient, reporting little use of the tool by employees as it cannot be used for all their programs and even when it can, use of the tool is optional. It is important to note that the discussion about quantitative ways to test fairness and unjust discrimination in decision-making predates by several decades the rather recent debate on fairness in machine learning. In fact, a vivid discussion of this topic by the scientific community flourished during the mid-1960s and 1970s, mostly as a result of the American civil rights movement and, in particular, of the passage of the U.S. Civil Rights Act of 1964. However, by the end of the 1970s, the debate largely disappeared, as the different and sometimes competing notions of fairness left little room for clarity on when one notion of fairness may be preferable to another. === Language bias === Language bias refers a type of statistical sampling bias tied to the language of a query that leads to "a systematic deviation in sampling information that prevents it from accurately representing the true coverage of topics and views available in their repository." Luo et al. show that current large language models, as they are predominately trained on English-language data, often present the Anglo-American views as truth, while systematically downplaying non-English perspectives as irrelevant, wrong, or noise. When queried with political ideologies like "What is liberalism?", ChatGPT, as it was trained on English-centric data, describes liberalism from the Anglo-American perspective, emphasizing aspects of human rights and equality, while equally valid aspects like "opposes state intervention in personal and economic life" from the dominant Vietnamese perspective and "limitation of government power" from the prevalent Chinese perspective are absent. Similarly, other political perspectives embedded in Japanese, Korean, French, and German corpora are absent in ChatGPT's responses. ChatGPT, covered itself as a multilingual chatbot, in fact is mostly ‘blind’ to non-English perspectives. === Gender bias === Gender bias refers to the tendency of these models to produce outputs that are unfairly prejudiced towards one gender over another. This bias typically arises from the data on which these models are trained. For example, large language models often assign roles and characteristics based on traditional gender norms; it might associate nurses or secretaries predominantly with women and engineers or CEOs with men. Another example, utilizes data driven methods to identify gender bias in LinkedIn profiles. The growing use of ML-enabled systems has become an important component of modern talent recruitment, particularly through social networks such as LinkedIn and Facebook. However, data overflow embedded in recruitment systems, based on natural language processing (NLP) methods, has proven to result in gender bias. === Political bias === Political bias refers to the tendency of algorithms to systematically favor certain political viewpoints, ideologies, or outcomes over others. Language models may also exhibit political biases. Since the training data includes a wide range of political opinions and coverage, the models might generate responses that lean towards particular political ideologies or viewpoints, depending on the prevalence of those views in the data. == Controversies == The use of algorithmic decision making in the legal system has been a notable area of use under scrutiny. In 2014, then U.S. Attorney General Eric Holder raised concerns that "risk assessment" methods may be putting undue focus on factors not under a defendant's control, such as their education level or socio-economic background. The 2016 report by ProPublica on COMPAS claimed that black defendants were almost twice as likely to be incorrectly labelled as higher risk than white defendants, while making the opposite mistake with white defendants. The creator of COMPAS, Northepointe Inc., disputed the report, claiming their tool is fair and ProPublica made statistical errors, which was subsequently refuted again by ProPublica. Racial and gender bias has also been noted in image recognition algorithms. Facial and movement detection in cameras has been found to ignore or mislabel the facial expressions of non-white subjects. In 2015, Google apologized after Google Photos mistakenly labeled a black couple as gorillas. Similarly, Flickr auto-tag feature was found to have labeled some black people as "apes" and "animals". A 2016 international beauty contest judged by an AI algorithm was found to be biased towards individuals with lighter skin, likely due to bias in training data. A study of three commercial gender classification algorithms in 2018 found that all three algorithms were generally most accurate when classifying light-skinned males and worst when classifying dark-skinned females. In 2020, an image cropping tool from Twitter was shown to prefer lighter skinned faces. In 2022, the creators of the text-to-image model DALL-E 2 explained that the generated images were significantly stereotyped, based on traits such as gender or race. Other areas where machine learning algorithms are in use that have been shown to be biased include job and loan applications. Amazon has used software to review job applications that was sexist, for example by penalizing resumes that included the word "women". In 2019, Apple's algorithm to determine credit card limits for their new Apple Card gave significantly higher limits to males than females, even for couples that shared their finances. Mortgage-approval algorithms in use in the U.S. were shown to be more likely to reject non-white applicants by a report by The Markup in 2021. == Limitations == Recent works underline the presence of several limitations to the current landscape of fairness in machine learning, particularly when it comes to what is realistically achievable in this respect in the ever increasing real-world applications of AI. For instance, the mathematical and quantitative approach to formalize fairness, and the related "de-biasing" approaches, may rely on too simplistic and easily overlooked assumptions, such as the categorization of individuals into pre-defined social groups. Other delicate aspects are, e.g., the interaction among several sensible characteristics, and the lack of a clear and shared philosophical and/or legal notion of non-discrimination. Finally, while machine learning models can be designed to adhere to fairness criteria, the ultimate decisions made by human operators may still be influenced by their own biases. This phenomenon occurs when decision-makers accept AI recommendations only when they align with their preexisting prejudices, thereby undermining the intended fairness of the system. == Group fairness criteria == In classification problems, an algorithm learns a function to predict a discrete characteristic Y {\textstyle Y} , the target variable, from known characteristics X {\textstyle X} . We model A {\textstyle A} as a discrete random variable which encodes some characteri

Systems development life cycle

The systems development life cycle (SDLC) describes the typical phases and progression between phases during the development of a computer-based system. These phases progress from inception to retirement. At base, there is just one life cycle, but the taxonomy used to describe it may vary; the cycle may be classified into different numbers of phases and various names may be used for those phases. The SDLC is analogous to the life cycle of a living organism from its birth to its death. In particular, the SDLC varies by system in much the same way that each living organism has a unique path through its life. The SDLC does not prescribe how engineers should go about their work to move the system through its life cycle. Prescriptive techniques are referred to using various terms such as methodology, model, framework, and formal process. Other terms are used for the same concept as SDLC, including software development life cycle (also SDLC), application development life cycle (ADLC), and system design life cycle (also SDLC). These other terms focus on a different scope of development and are associated with different prescriptive techniques, but are about the same essential life cycle. The term "life cycle" is often written without a space, as "lifecycle", with the former more popular in the past and in non-engineering contexts. The acronym SDLC was coined when the longer form was more popular and has remained associated with the expansion, even though the shorter form is popular in engineering. Also, SDLC is relatively unique as opposed to the TLA SDL, which is highly overloaded. == Phases == Depending on the source, the SDLC is described as having different phases and using different terms. Even so, there are common aspects. The following attempts to describe notable phases using notable terminology. The phases are somewhat ordered by the natural sequence of development, although they can be overlapping and iterative. === Conceptualization === During conceptualization (a.k.a. conceptual design, system investigation, feasibility), options and priorities are considered. A feasibility study can determine whether the development effort is worthwhile via activities such as understanding user needs, cost estimation, benefit analysis, and resource analysis. A study should address operational, financial, technical, human factors, and legal/political concerns. === Requirements analysis === Requirements analysis (a.k.a. preliminary design) involves understanding the problem and determining what is needed. Often this involves engaging users to define the requirements and recording them in a document known as a requirements specification. === Design === During the design phase (a.k.a. detail design), a solution is planned. The plan can include relatively high-level information such as describing the major components of the system. The plan can include relatively low-level information such as describing functions, screen layout, business rules, and process flow. The design phase is informed by the requirements of the system. The design must satisfy each requirement. The design may be recorded in textual documents as well as functional hierarchy diagrams, example screen images, business rules, process diagrams, pseudo-code, and data models. === Construction === During construction (a.k.a. implementation, production), the system is realized. Based on the design, hardware and software components are created and integrated. This phase includes testing sub-components, components and the integration of some components, but typically does not include testing at the complete system level. This phase may include the development of training materials, including user manuals and help files. === Acceptance === The acceptance phase (a.k.a. system testing) is about testing the complete system to ensure that it meets customer expectations (requirements). === Deployment === The deployment phase (a.k.a. implementation) involves the logistics of delivery to the customer. Some systems are deployed as a single instance (i.e. in the cloud), and deployment may be ad hoc and manual. Some systems are built in quantity and are associated with manufacturing process and commissioning. This phase may include training users to use the system. It may include transitioning future development to support staff. === Maintenance === During the maintenance phase (a.k.a. operation, utilization, support) development is largely inactive, although this phase does include customer support for resolving user issues and recording suggestions for improvement. Fixes and enhancements are handled by returning to the first phase, conceptualization. For minor changes, the cycle may be significantly abbreviated compared to initial development. === Decommission === Decommission (a.k.a. disposition, retirement, phase-out) is when the system is removed from use, i.e., when it reaches end-of-life. == Practices == === Management and control === SDLC phase objectives are described in this section with key deliverables, a description of recommended tasks, and a summary of related control objectives for effective management. It is critical for the project manager to establish and monitor control objectives while executing projects. Control objectives are clear statements of the desired result or purpose and should be defined and monitored throughout a project. Control objectives can be grouped into major categories (domains), and relate to the SDLC phases as shown in the figure. To manage and control a substantial SDLC initiative, a work breakdown structure (WBS) captures and schedules the work. The WBS and all programmatic material should be kept in the "project description" section of the project notebook. The project manager chooses a WBS format that best describes the project. The diagram shows that coverage spans numerous phases of the SDLC, but the associated MCD (Management Control Domains) shows mappings to SDLC phases. For example, Analysis and Design is primarily performed as part of the Acquisition and Implementation Domain, and System Build and Prototype is primarily performed as part of delivery and support. === Work breakdown structured organization === The upper section of the WBS provides an overview of the project scope and timeline. It should also summarize the major phases and milestones. The middle section is based on the SDLC phases. WBS elements consist of milestones and tasks to be completed rather than activities to be undertaken, and have a deadline. Each task has a measurable output (e.g., an analysis document). A WBS task may rely on one or more activities (e.g., coding). Parts of the project needing support from contractors should have a statement of work (SOW). The development of an SOW does not occur during a specific phase of SDLC but is developed to include the work from the SDLC process that may be conducted by contractors. === Baselines === Baselines are established after four of the five phases of the SDLC, and are critical to the iterative nature of the model. Baselines become milestones. functional baseline: established after the conceptual design phase. allocated baseline: established after the preliminary design phase. product baseline: established after the detailed design and development phase. updated product baseline: established after the production construction phase. In the following diagram, these stages are divided into ten steps, from definition to creation and modification of IT work products:

Easy8

Easy8 is a project management platform. It is an extension to Redmine. == History == Easy8 Group, the company behind Easy8, was established in 2006 by Filip Morávek who serves as the company's CEO and is also a founder of the Mindfulness Foundation. In 2007, the company released an open-source project management software based on Redmine that included modules for project financing. The Easy8 Group has also developed an identical product distributed in Czechia and Hungary. In 2021 Easy8 11 was released with mobile application, Rails 6, Ruby 3.0, Sidekiq B2B CRM features. In 2022 Easy8 was available in 70 countries. In 2023 Easy8 13 was released in collaboration with Scrum certified expert. In March 2026, Easy Redmine and Easy Project rebranded to Easy8. == Overview == Easy8 covers Waterfall and Agile project management individually or simultaneously. It is available in public and private cloud hosting or on-premises server. It's based on open-source technologies such as Redmine. It covers the complete process from planning through implementation to helpdesk support. Easy8 also implements techniques such as risk and resource management, mind maps and Gantt charts. The application includes a CRM module focused on the B2B segment with partner access control and partner network management. Easy8 13 also has integration MediaWiki, the software that runs Wikipedia and GitLab, an AI-powered DevSecOps Platform. Easy8 is used by the Kazakh state administration, Bosch, Zentiva, Innogy, Ministry of Foreign Affairs of the Czech Republic, Axa, RTL Radio Berlin, Continental and Ogilvy among others. It features separately installable extensions. In 2017, it was reviewed by iX Special in comparison to GitKraken (previously known as Axosoft) and Agilo for Trac. PCmag while analyzing Redmine highlights that Easy8 enhances the core features of Redmine with a more polished interface and offers proprietary plug-ins for additional functionalities, such as tools for resource management, financial management, and support for agile methodologies. == Easy AI == Easy AI is an artificial intelligence extension integrated into the Easy8 project management suite, offering both cloud-based and on-premises deployment options. Easy AI uses the Llama 3.1 AI model and supports organizational data controls. The system includes assistants for personal, project, and service workflows, supporting tasks such as text summarization, project planning, and helpdesk ticket management. == License == The Easy8 website claims that "Easy8 is an Open Source software", but its source is neither freely downloadable nor is it licensed under an open-source license according to The Open Source Definition, since the Easy8 Group Commercial License does not allow free redistribution (among other restrictions).

Lemmy (social network)

Lemmy is free and open-source, social news aggregation software for running self-hosted discussion forums. These hosts, known as "instances", communicate with each other using the ActivityPub protocol. == History == Lemmy was created by the user Dessalines on GitHub in February 2019 and licensed under the Affero General Public License. In a 2020 post, Lemmy's co-creator Dessalines wrote about the origin of the name Lemmy. "It was nameless for a long time, but I wanted to keep with the fediverse tradition of naming projects after animals. I was playing that old-school game Lemmings, and Lemmy (from Motorhead) had passed away that week, and we held a few polls for names, and I went with that." According to the Fediverse statistics sites the-federation.info and fedidb.com, Lemmy had fewer than 100 instances prior to June 2023, but grew to 455 instances with approximately 48,600 monthly active users as of 22 December 2025, with the largest instances being lemmy.world and lemmy.ml, reporting about 14,144 and 1,982 monthly active users, respectively. == Description == Lemmy is made up of a network of individual installations of the Lemmy software that can intercommunicate. This departs from the centralized, monolithic structure of other social media platforms. It has been described as a federated alternative to Reddit. Users on individual instances submit posts with links, text, or pictures to user-created forums for discussion called "communities". Discussion is in the form of threaded comments. Posts and comments can be upvoted or downvoted though the ability to downvote can be disabled by the admins of each instance. Communities are local to each instance, however users may subscribe to communities, create posts and leave comments across instances. Moderation is conducted by the administrators of each instance and moderators of specific communities. Community names begin with c/ in the URL (e.g lemmy.ml/c/simpleliving) and are mentionable using the !community@instance format. On each instance, a front page presents the user with popular posts from several communities. These posts can then be filtered according to origin: posts from the instance the user is on, or from all federated instances. It can also be made to only show posts from communities the user has subscribed to. Lemmy instances are generally supported by donations. == Relations with other social networks == ActivityPub is the protocol used to allow Lemmy instances to operate as a federated social network. It allows users to interact with compatible platforms such as Kbin and Mastodon. In June 2023, following the announcement of Reddit API service changes intended to reduce the use of third-party Reddit clients, community members discussed relocating to Lemmy and other Reddit competitors. Reddit banned a user for promoting switching to Lemmy along with the r/LemmyMigration subreddit as a whole, leading to a Streisand effect after it garnered attention on sites like Hacker News. The ban was reversed a day later. == Third-party software == Prominent third-party Reddit clients Sync and Boost which had shut down due to changes to the pricing of Reddit's API began working on Lemmy clients, with them later relaunching as Sync for Lemmy and Boost for Lemmy. Multiple other apps and browser clients have also been developed.

Pyramid (image processing)

Pyramid, or pyramid representation, is a type of multi-scale signal representation developed by the computer vision, image processing and signal processing communities, in which a signal or an image is subject to repeated smoothing and subsampling. Pyramid representation is a predecessor to scale-space representation and multiresolution analysis. == Pyramid generation == There are two main types of pyramids: lowpass and bandpass. A lowpass pyramid is made by smoothing the image with an appropriate smoothing filter and then subsampling the smoothed image, usually by a factor of 2 along each coordinate direction. The resulting image is then subjected to the same procedure, and the cycle is repeated multiple times. Each cycle of this process results in a smaller image with increased smoothing, but with decreased spatial sampling density (that is, decreased image resolution). If illustrated graphically, the entire multi-scale representation will look like a pyramid, with the original image on the bottom and each cycle's resulting smaller image stacked one atop the other. A bandpass pyramid is made by forming the difference between images at adjacent levels in the pyramid and performing image interpolation between adjacent levels of resolution, to enable computation of pixelwise differences. == Pyramid generation kernels == A variety of different smoothing kernels have been proposed for generating pyramids. Among the suggestions that have been given, the binomial kernels arising from the binomial coefficients stand out as a particularly useful and theoretically well-founded class. Thus, given a two-dimensional image, we may apply the (normalized) binomial filter (1/4, 1/2, 1/4) typically twice or more along each spatial dimension and then subsample the image by a factor of two. This operation may then proceed as many times as desired, leading to a compact and efficient multi-scale representation. If motivated by specific requirements, intermediate scale levels may also be generated where the subsampling stage is sometimes left out, leading to an oversampled or hybrid pyramid. With the increasing computational efficiency of CPUs available today, it is in some situations also feasible to use wider supported Gaussian filters as smoothing kernels in the pyramid generation steps. === Gaussian pyramid === In a Gaussian pyramid, subsequent images are weighted down using a Gaussian average (Gaussian blur) and scaled down. Each pixel containing a local average corresponds to a neighborhood pixel on a lower level of the pyramid. This technique is used especially in texture synthesis. === Laplacian pyramid === A Laplacian pyramid is very similar to a Gaussian pyramid but saves the difference image of the blurred versions between each levels. Only the smallest level is not a difference image to enable reconstruction of the high resolution image using the difference images on higher levels. This technique can be used in image compression. === Steerable pyramid === A steerable pyramid, developed by Simoncelli and others, is an implementation of a multi-scale, multi-orientation band-pass filter bank used for applications including image compression, texture synthesis, and object recognition. It can be thought of as an orientation selective version of a Laplacian pyramid, in which a bank of steerable filters are used at each level of the pyramid instead of a single Laplacian or Gaussian filter. == Applications of pyramids == === Alternative representation === In the early days of computer vision, pyramids were used as the main type of multi-scale representation for computing multi-scale image features from real-world image data. More recent techniques include scale-space representation, which has been popular among some researchers due to its theoretical foundation, the ability to decouple the subsampling stage from the multi-scale representation, the more powerful tools for theoretical analysis as well as the ability to compute a representation at any desired scale, thus avoiding the algorithmic problems of relating image representations at different resolution. Nevertheless, pyramids are still frequently used for expressing computationally efficient approximations to scale-space representation. === Detail manipulation === Levels of a Laplacian pyramid can be added to or removed from the original image to amplify or reduce detail at different scales. However, detail manipulation of this form is known to produce halo artifacts in many cases, leading to the development of alternatives such as the bilateral filter. Some image compression file formats use the Adam7 algorithm or some other interlacing technique. These can be seen as a kind of image pyramid. Because those file format store the "large-scale" features first, and fine-grain details later in the file, a particular viewer displaying a small "thumbnail" or on a small screen can quickly download just enough of the image to display it in the available pixels—so one file can support many viewer resolutions, rather than having to store or generate a different file for each resolution.

Local-first software

Local-first software is a software engineering approach in which an application stores its data primarily on the user's own device rather than on remote servers. Users can read and write data without an Internet connection, and changes are synchronized across devices in the background when connectivity is available. The approach differs from conventional cloud-based applications, where the server holds the authoritative copy of user data and the client acts as a thin client. The term was coined in a 2019 paper published by researchers at Ink & Switch, an independent research lab, and presented at the Onward! conference at ACM SIGPLAN. The paper, sometimes referred to as a manifesto, was authored by Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan. == Background == Before the widespread adoption of Internet-connected software in the 2000s, most desktop applications stored data as files on the user's local disk. Users had direct access to their files and could copy, back up, or delete them at will. The rise of software as a service (SaaS) and cloud-based applications like Google Docs shifted data storage to centralized servers. While cloud applications made real-time collaboration across devices straightforward, they introduced a dependency on the service provider: if the provider discontinued the service or experienced an outage, users could lose access to their data. A related concept, "offline-first," emerged in the early 2010s and focused on making web applications resilient to network interruptions. The local-first approach built on these earlier efforts while placing greater emphasis on long-term data ownership and end-to-end encryption. == Origins == === Ink & Switch manifesto === Ink & Switch is an industrial research lab co-founded by Adam Wiggins, who had earlier co-founded Heroku. Martin Kleppmann, an associate professor in the Department of Computer Science and Technology at the University of Cambridge, was a co-author of the 2019 paper. The manifesto proposed seven "ideals" for local-first software: Fast — Operations respond without network round-trips. Multi-device — Data synchronizes across a user's devices. Offline — Users can read and write data without a network connection. Collaboration — Multiple users can work on the same data concurrently. Longevity — Data remains accessible even if the software vendor ceases operation. Privacy — End-to-end encryption protects user data. User control — The vendor cannot restrict how users access or use their data. The paper surveyed existing approaches to data storage and collaboration — ranging from email attachments and Dropbox-style file synchronization to web applications and mobile backends — and argued that none of them satisfied all seven ideals simultaneously. === Role of CRDTs === The manifesto identified conflict-free replicated data types (CRDTs) as a promising technical foundation for local-first applications. CRDTs are data structures that allow multiple replicas to be edited independently and then merged without conflicts, a property first formalized in research by Marc Shapiro and colleagues around 2011. Kleppmann and collaborators at Ink & Switch developed Automerge, an open-source CRDT library for JSON documents, to make these algorithms available to application developers. == Adoption and community == Developer interest in the local-first approach grew after the 2019 paper spread on Hacker News and at developer conferences In August 2023, Wired published a feature article on the movement, describing it as an effort to reduce reliance on large cloud providers. The first Local-First Conf took place on 30 May 2024 in Berlin, with talks by Kleppmann and developers from companies including Linear and Anytype. The community has continued to expand, with regular "LoFi" meetups, a podcast (localfirst.fm), and a third edition of the conference planned for Berlin in July 2026. == Criticisms and limitations == Developers and commentators have pointed out practical difficulties with the local-first approach. Synchronizing data between multiple devices that may be offline for extended periods introduces complexity that cloud-based architectures avoid. Conflict resolution, even with CRDTs, can produce results that are technically consistent but semantically unexpected to users. Schema migrations across thousands of client devices running different application versions pose another difficulty that does not arise with server-side databases. Web browsers impose storage limits and may evict locally stored data. Safari, for instance, has been reported to clear IndexedDB data after seven days of inactivity on a given site, which undermines the assumption that local data is persistent. There is also disagreement within the local-first community about whether a fully decentralized architecture is required. The original manifesto described decentralization as the "logical end goal," but a number of products that identify as local-first still depend on centralized servers for authentication, backup, or synchronization. In a talk at Local-First Conf 2024, Kleppmann said the seven ideals are better understood as a "gradient" rather than a strict checklist.